Results-driven Lead Data Engineer with strong skills in data pipeline design, ETL process development, and database optimization. Proven track record in implementing data governance policies and optimizing cloud architecture.
Overview
15
15
years of professional experience
Work History
Lead Data Engineer Data Engineer/Architectural Engineer
BeCSE,Delhaize
12.2021 - Current
Executed ETL ingestion from SAP sources, cleansing and loading data into prepared layers.
Collaborated with cross-functional teams to gather data requirements.
Designed data pipelines for efficient data integration and processing.
Developed ETL processes using industry-standard tools and frameworks.
Optimized database performance through query tuning and indexing strategies.
Maintained data integrity and quality across multiple systems and platforms.
Implemented data governance policies to ensure compliance and security.
Conducted troubleshooting for data-related issues in production environments.
Documented technical specifications for data engineering processes and procedures.
Collaborated with other teams to understand their requirements and deliver solutions accordingly.
Aggregated data and performed data quality checks before loading into DELTA tables.
Designed end-to-end data reconciliation flow based on business KPI requirements.
Utilized Soda for data quality checks and Grafana for log insights and alerts.
Refactored C# API code to ensure timely data refresh in Hybris UI.
Conducted API testing using Postman to validate functionalities.
Deployed code across environments using DevOps CI/CD practices.
Created reusable validation templates and wrappers for diverse job executions.
Monitored system health and performance metrics to ensure smooth functioning of the system.
Implemented new database technologies such as NoSQL databases for efficient storage of large volumes of data.
Implemented best practices around data governance and compliance standards such as GDPR and HIPAA.
Analyzed user needs and created logical models that met those needs while adhering to industry standards.
Designed, built, and maintained high-performance databases for reporting and analysis purposes.
Deployed machine learning models to production environment for real-time predictions.
Configured replication services between distributed clusters in order to synchronize datasets across regions.
Ensured data accuracy through regular testing and validation procedures prior to deployment in production environments.
Recommended data analysis tools to address business issues.
Followed industry innovations and emerging trends through scientific articles, conference papers or self-directed research.
Applied feature selection algorithms to predict potential outcomes.
DXC TECHNOLOGY
04.2010 - Current
Healthcare – 6years
Retail – 13 years
Energy – 1 years
Finance – 4 year
Certifications
(Professional Activities, Certifications, and Training Attended)
MCSA (Microsoft certified solutions associate) DATA engineering with azure
Informatica Power Centre (DEVELOPER and Administrator) Certified
Azure Solution Architect (Trained in AZ-305 and AZ-204, certification is in progress), More than 21 years of professional experience in software engineering and technology consulting
Successfully played roles of Project lead, Designing, ETL/Data architect, Administrator and Technical Lead in various assignments
Define cloud network architecture using Azure virtual networks, VPN, and express route to establish connectivity between on premise and cloud
Assist leadership with the ongoing development of policies and procedures for the purpose of consistent product delivery
Develop custom features in Visual Studio based on specifications and technical designs
Develop Quality framework for monitoring and data reconciliation
Designed and automated the provisioning and deployment process
Participate in internal and customer meetings assisting with the ongoing evolution of technology offerings
Provide technical guidance on building solutions using Azure PaaS and other services
Troubleshoot and identify performance, connectivity and other issues for the applications hosted in Azure platform.
About Project: FLASH framework is collection of best practices, processing logic, rules, data model, design pattern and methodology
This also provide centralized metadata of all data processing jobs in the environment
ABC is used for data validation and reconciliation during transformation of Data through different layers and capture the data mismatch and Pipeline status and displayed on Internal Dashboard
ADF used for workflow, and generic python script used for many pipelines
Generic python script is driven by metadata table through Java API call
24/7 HEALTH and QUALITY of the system for monitoring and logging
Improved confidence in the data through balancing analysis and reporting capabilities and improves ETL processing through job execution stats and optimal job sequencing
Roles and responsibilities:
Architected the FLASH framework using Azure Stack
Proposed Source reconciliation design and Designed the end-to-end flow of data reconciliation using Generic PySpark scripts
Designed and developed Centralized Meta-driven approach for logging in Health and Quality of Jobs
Designed and executed the Stream-based data validation with dead-letter count validations
Designed and implemented Event based Trigger to trigger jobs to read YAML and store in metadata sqldw in Jason format and used these for various validations
Designed databricks scripts to read SQLDW logs and stream it to event hub for Central monitoring framework
Designed reusable templates and wrappers for validations, these templates can be plugged in to any jobs to execute various validations and to execute reconciliation process.
Lead Developer/Architect
AHOLD, Com DXC Technology
09.2016 - 03.2021
Usql to PySpark conversion with Gen2 ADL:
Resource Used: ADF, ADB, ADLS Gen2, PySpark, Spark SQL, Python
About Project: This project has taken care two part of migration, Usql_to_Pyspark and ADL Gen1_to_Gen2
Notebook workflow also implemented on this project for making generic code in place
Master Notebook having business logic and other functionality within the specific script
Fileops notebook called by master notebook and perform cleanup and schema conversion related programs.
Architect and Lead Developer
ABC, Hewlett Packard / DXC Technology
09.2014 - 01.2016
Data validation framework:
Resource Used: ADF, ADB, Data lake Gen1, Web API, SQL, Python
About Project: ABC is used for Data validation during transformation of Data through different layers and capture the data mismatch and Pipeline status and displayed on Internal Dashboard
ADF used for workflow, and generic python script used for many pipelines
Generic python script is driven by metadata table through Java API call
Aggregation and CIP Project in Azure/SQLDW with Gen1 ADL:
Resource, SQL DW Procedures, Hive, Data Lake Gen1
About Project: Aggregation logic in legacy EDW was implemented in Azure with logic embedded in SQL DW Procedures, files in Data Lake Gen1 with different layers like RDS, SDM, CDM were fetched ingest in sqldw with external tables as polybase and loaded into the final table with ABC logic
These procedures were later called in ADF with ABC in pipeline level
Aggregation were mainly related to item discount, warehouse shipment, item cost and extended cost related logics
Roles and responsibilities:
Worked extensively in performance tuning in superdome
Worked as an SME in Aggregation for AD/FISCAL cost data and sell thru && non sell thru data
Developed and implement new ADF pipelines
Convert existing U_SQL to PySpark code
Experience on GIT for version controller
Implement innovative ideas and build customer trust
Reduce incidents using python automation program
Hands on experience on Data-bricks, PyCharm, Jupiter Notebook, Spyder, ADF, Blob, VM
Cleaned, merged, and manipulated Datasets and conducted feature engineering using Pandas
POC work on ABC model and Predictive analysis on Store performance accepted by client and implemented successfully
CDW/IDW/EDW Project
Client AHOLD, The “Commercial Synergy Tier 3 Reporting EDW” project was done by DXC to allow combined reporting of Vendor Funding and Sales data for Delhaize America
DXC’s role was to build a new product table that will consist of all products sold by Delhaize America mapped into the Ahold USA Hierarchy and to do aggregation of the Product data and the weekly movement data
This aggregated data is generated on the weekly basis and it is used by the report generation team and the business users
Responsibilities –
Understanding the existing AutoSys jobs and scripts which relates to Business logic and converting the same to make it work in EDW environment
Understanding the complex logic business transformation and implement into ETL development and design
Regular interaction with the client to review the design
Analyzing the Design pattern of various staging tables and Dimension and facts tables
Perform performance tuning in the load level to improve the performance
Perform object deployments on versioned repositories from QA to Production
Worked extensively in performance tuning in superdome
Environment:
Superdome, oracle, Informatica, Unix
ETL Developer/ Informatica Administrator
Hewlett Packard, CVS Caremark
09.2013 - 10.2014
This project is related to Retail energy business, it has been divided into two phases
The first phase of the project involves the migration of components of the existing ETL processing, databases and reports that are required to support TSA reporting to the new technology platform
Phase 2 the implementation of the remodeled Aurora target architecture will commence for the in-scope areas of the TSA.HP and Aurora have jointly recognized that the re-architecting of the migrated solution needs to be prioritized to focus on the highly-shared data in order to balance the mitigation of the data integrity risks with the timeframe risks
Responsibilities –
Understanding the existing BODI ETL Business logic and converting the same into Informatica mapping logic in phase 1
Understanding the complex logic business transformation and implement into ETL development and design
Designing the ETL artifacts accordingly with the business logic and implementing them with minimum complexity
Regular interaction with the client to review the design
Analyzing the Design pattern of various staging tables and Dimension and facts tables
Perform performance tuning in the ETL level to improve the performance
Perform object deployments on versioned repositories from QA to Production
Setup and administer Power Center security
Setup and configure Power Center domain and services
Install Power Center software, administer repositories
Creating production environments including object migrations
Creation and maintenance of Informatica users and privileges
Worked on SQL queries to query the Repository DB to find the deviations from Company’s ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions and Workflows
CVS Caremark is taking on an enterprise wide initiative to create an enterprise data warehouse using an industry standard healthcare model
As such, we have existing source system as Rxclaim, this source system is adding three new feed in edw2 Teradata, these new feeds will be fed to the warehouse with information from Rxclaim to EDW2
The main objective of this project is to populate the Rxclaim related data using these three feeds to Edw2 warehouse
Data is loaded into Oracle environment and relevant dimension data is built based on claims
Same is cascaded to Teradata environment for report processing
Responsibilities –
Design the solution for individual projects like Aetna Patient Override and Rxclaim Gaps, CDD FULL, Prompt pay phase III FEDB
Planning, analyzing, implementing, and documenting strategies related to ETL development and design
Perform design and code reviews and do the knowledge transfers with the team
Analyzing the Design pattern of various staging tables and Dimension tables
Designed the ETL using Shell script, Informatica, PL/SQL, Summary Management, DB Sync tool and Teradata Bteq to load data from file to Teradata via oracle
Planning, analyzing, implementing and documenting strategies related to ETL development and design
Perform design and code reviews and do the knowledge transfers with the team
Achievements:
Completed all the project in-time as well as coordinated multiple projects
Used several re-usable components and reduced the development time as well as improved performance
Implemented project execution standards for better project management and tracking
Environment:
Oracle, Informatica, Teradata and Shell Scripting
EOMS Staging Implementation
Role Technical Lead
CVS Caremark, Hewlett Packard
04.2010 - 04.2011
CVS Caremark is taking on an enterprise wide initiative to create an enterprise data warehouse using an industry standard healthcare model
As such, EOMS (Enterprise Opportunity Management System) application will be transitioning to use this model
This Model is mainly used to migrate the data produced by the EOMS application to the new model structure on EDW1 also to Load data from a XML file to the corresponding 6(Exact Number TBD) staging tables in the EOMSDMA schema
In the later phase of this Project the data from the various EOMSDMA schemas will be used to load Denorm tables using various business functionality
Using Informatica as an ETL tool and Unix shell scripting the data is migrated
Responsibilities –
Analyzing the Design pattern of various staging tables and Denorm tables
Enhancing the Shell scripts to automate the process of ETL Objects
Understanding the Business functionalities in low level design and to create a new design document based on the requirement for Denorm tables
Analyzing the data model of Denorm tables at mapping level and preparing the HLD and LLD for each and every mapping
Conducting design workshop with our business team
Preparing UT, QAT test cases
Achievements:
Developed Complex etl objects with Improved performance for loading into the dimension and facts tables.
TSA
Aurora Energy Pty LTD
Education
Abnitio- 1 yearPL/SQL -
Oracle – 15 -
Teradata – 7 -
YearsMicroStrategy -2 years -
Skills
Top Skills (Technical / Non-Technical skills)
Data pipeline design
ETL process development
Database optimization
Data governance implementation
Cloud architecture design
Cross-functional collaboration
Technical documentation creation
Microsoft Azure stack, Databricks, PySpark - 8 years
Performance tuning methodologies
Compliance standards adherence
Problem resolution strategies
Quality assurance practices
Agile project management
User needs analysis
Data integration
Data warehousing
Version control
API development
Relational databases
Data modeling
Data migration
Timeline
Lead Data Engineer Data Engineer/Architectural Engineer
Business Process Manager, Forecasting & Replenishment at Ahold Delhaize USA Procurement / Peapod Digital Labs / PeapodBusiness Process Manager, Forecasting & Replenishment at Ahold Delhaize USA Procurement / Peapod Digital Labs / Peapod