Project#1
Designation: Data Engineer
Project Name: Replenishment Tank Context (RTC), Order API, Alerts to SharePoint
Project Description:
- Replenishment Tank Context provides a summary of tanks and maintains stock inventory, including materials, customers, ship-to and sold-to information, and other customer-related details.
- This helps the business identify which tanks are active and inactive for placing orders.
Tools Used:Azure Data Factory,Azure Data Flows,SQL,Server,Logic Apps,Databricks,Azure DevOps,SharePoint,Snowflake,Blob Storage
Team Size:3
Responsibilities: Team Member
- Design the technical requirements and data flow design documents for RTC, Order API, and Alerts.
- Build generic pipelines to move data from the source to ADLS Gen 2 Data Lake and SharePoint using ADF and Logic Apps for Alerts.
- Build the RTC pipeline based on the BRD provided by the business team, which involved around 20 sources to the RTC pipeline.
- Build ADF pipelines to move data to Snowflake and write Snowflake stored procedures to convert data into JSON arrays, submit to MuleSoft API, receive the response, and store it back into the Snowflake table.
- Utilize components such as Lookup, For Each, Get Metadata, Execute Pipeline, Copy, Dataflows, Variables, Triggers, etc.
- Perform unit testing and get reviews from the Architecture team, then push changes into DEV & QA using DevOps pipelines.
- Add error logging and monitoring steps to existing pipelines for monitoring activities once the code is deployed to production.
- Deploy changes to PROD with the help of the CI/CD process and perform post-production validation.
- Follow Agile methodologies, including the creation of PBIs, sprint reviews, and scrum calls.
Project#2
Designation: Data Engineer
Project Name: Digital Bulk
Project Description:
- The Digital Bulk project helps to provide optimized routes to truck drivers and predict the consumption of material in a tank.
- The role of the Data Engineer in this project is to supply data from different source systems and provide it to downstream users.
- Major data sources include data from Nalco SAP, data coming from sensors attached to tanks, and distance data between two zip codes.
Tools Used: Azure Data Factory,Azure Data Flows,SQL,Server,Logic Apps,Databricks,Azure DevOps.
Team Size:6
Responsibilities: Team Member
- Design the technical requirements and data flow design documents for data coming from different source systems like Nalco SAP, OIP (sensor data), SQL Server, Shared Files, etc.
- Build generic pipelines to move data from the source to ADLS Gen 2 Data Lake using Azure Data Factory (ADF).
- Build transformations according to business logic using ADF and data flows.
- Build generic pipelines to move data from SAP, allowing reuse and control over the number of pipelines.
- Utilize components such as Lookup, For Each, Get Metadata, Execute Pipeline, Copy, Dataflows, Variables, Triggers, etc.
- Perform unit testing and get reviews from the Architecture team, then push changes into DEV & QA using DevOps pipelines.
- Add error logging and monitoring steps to existing pipelines for monitoring activities once the code is deployed to production.
- Deploy changes to PROD with the help of the CI/CD process and perform post-production validation.
- Follow Agile methodologies, including the creation of PBIs, sprint reviews, and scrum calls.
Project#3
Designation : Data Engineer
Project Name: ES-Billing Automation
Project Description:
- ES-Billing is a migration project created on the Enterprise Data Lake (EDL) environment.
- It deals with financial data on applications, servers, and application instances for supporting power storage and maintenance for companies HPI, HPE, and DXC.
- This project provides HPI with insights into the amount spent on shared services, transfer service agreements (TSA), and DXC maintenance and support.
- It helps HPI identify areas to reduce costs by minimizing shared services and TSA.
Tools Used: Hadoop,Talend Big Data 6.2.1,Azure,SQL Server,Power BI.
Team Size:4
Responsibilities: Team member
- Build Hive external tables based on design provided by data modelling Team.
- Identifying the data files for corresponding hive tables from different sources, move these files into Linux environment of Hadoop cluster (Landing Zone) using tHDFSput Component.
- Creating a Talend job for moving the Landing zone files into loading zone (HDFS Data Lake) using Talend Big Data 6.2.1 tool and perform transformation on loading zone tables to make into working zone tables using Talend
- Created Talend jobs for maintaining history of data by partitioning the hive tables based on month and year.
- Unit testing and Created Talend job for Moving Hive Tables into Azure cloud Sql-Server database for Reporting Layer and Scheduling the Talend Jobs for monthly loading the data.