Summary
Overview
Work History
Education
Skills
Timeline
Generic

Pranabesh Ghosh

Chennai

Summary

Results-driven Lead Data Engineer with strong skills in data pipeline design, ETL process development, and database optimization. Proven track record in implementing data governance policies and optimizing cloud architecture.

Overview

15
15
years of professional experience

Work History

Lead Data Engineer Data Engineer/Architectural Engineer

BeCSE,Delhaize
12.2021 - Current
  • Executed ETL ingestion from SAP sources, cleansing and loading data into prepared layers.
  • Collaborated with cross-functional teams to gather data requirements.
  • Designed data pipelines for efficient data integration and processing.
  • Developed ETL processes using industry-standard tools and frameworks.
  • Optimized database performance through query tuning and indexing strategies.
  • Maintained data integrity and quality across multiple systems and platforms.
  • Implemented data governance policies to ensure compliance and security.
  • Conducted troubleshooting for data-related issues in production environments.
  • Documented technical specifications for data engineering processes and procedures.
  • Collaborated with other teams to understand their requirements and deliver solutions accordingly.
  • Aggregated data and performed data quality checks before loading into DELTA tables.
  • Designed end-to-end data reconciliation flow based on business KPI requirements.
  • Utilized Soda for data quality checks and Grafana for log insights and alerts.
  • Refactored C# API code to ensure timely data refresh in Hybris UI.
  • Conducted API testing using Postman to validate functionalities.
  • Deployed code across environments using DevOps CI/CD practices.
  • Created reusable validation templates and wrappers for diverse job executions.
  • Monitored system health and performance metrics to ensure smooth functioning of the system.
  • Implemented new database technologies such as NoSQL databases for efficient storage of large volumes of data.
  • Implemented best practices around data governance and compliance standards such as GDPR and HIPAA.
  • Analyzed user needs and created logical models that met those needs while adhering to industry standards.
  • Designed, built, and maintained high-performance databases for reporting and analysis purposes.
  • Deployed machine learning models to production environment for real-time predictions.
  • Configured replication services between distributed clusters in order to synchronize datasets across regions.
  • Ensured data accuracy through regular testing and validation procedures prior to deployment in production environments.
  • Recommended data analysis tools to address business issues.
  • Followed industry innovations and emerging trends through scientific articles, conference papers or self-directed research.
  • Applied feature selection algorithms to predict potential outcomes.

DXC TECHNOLOGY
04.2010 - Current
  • Healthcare – 6years
  • Retail – 13 years
  • Energy – 1 years
  • Finance – 4 year
  • Certifications (Professional Activities, Certifications, and Training Attended)
  • MCSA (Microsoft certified solutions associate) DATA engineering with azure
  • Informatica Power Centre (DEVELOPER and Administrator) Certified
  • Azure Solution Architect (Trained in AZ-305 and AZ-204, certification is in progress), More than 21 years of professional experience in software engineering and technology consulting
  • Successfully played roles of Project lead, Designing, ETL/Data architect, Administrator and Technical Lead in various assignments
  • Define cloud network architecture using Azure virtual networks, VPN, and express route to establish connectivity between on premise and cloud
  • Assist leadership with the ongoing development of policies and procedures for the purpose of consistent product delivery
  • Develop custom features in Visual Studio based on specifications and technical designs
  • Develop Quality framework for monitoring and data reconciliation
  • Designed and automated the provisioning and deployment process
  • Participate in internal and customer meetings assisting with the ongoing evolution of technology offerings
  • Provide technical guidance on building solutions using Azure PaaS and other services
  • Troubleshoot and identify performance, connectivity and other issues for the applications hosted in Azure platform.

Lead Developer/Architect

AHOLD NL, Com DXC Technology
01.2021 - 11.2021
  • FLASH (Data Quality && Reconciliation framework): Resource Used: Azure Stack, REST API, PySpark
  • About Project: FLASH framework is collection of best practices, processing logic, rules, data model, design pattern and methodology
  • This also provide centralized metadata of all data processing jobs in the environment
  • ABC is used for data validation and reconciliation during transformation of Data through different layers and capture the data mismatch and Pipeline status and displayed on Internal Dashboard
  • ADF used for workflow, and generic python script used for many pipelines
  • Generic python script is driven by metadata table through Java API call
  • 24/7 HEALTH and QUALITY of the system for monitoring and logging
  • Improved confidence in the data through balancing analysis and reporting capabilities and improves ETL processing through job execution stats and optimal job sequencing
  • Roles and responsibilities:
  • Architected the FLASH framework using Azure Stack
  • Proposed Source reconciliation design and Designed the end-to-end flow of data reconciliation using Generic PySpark scripts
  • Designed and developed Centralized Meta-driven approach for logging in Health and Quality of Jobs
  • Designed and executed the Stream-based data validation with dead-letter count validations
  • Designed and implemented Event based Trigger to trigger jobs to read YAML and store in metadata sqldw in Jason format and used these for various validations
  • Designed databricks scripts to read SQLDW logs and stream it to event hub for Central monitoring framework
  • Designed reusable templates and wrappers for validations, these templates can be plugged in to any jobs to execute various validations and to execute reconciliation process.

Lead Developer/Architect

AHOLD, Com DXC Technology
09.2016 - 03.2021
  • Usql to PySpark conversion with Gen2 ADL: Resource Used: ADF, ADB, ADLS Gen2, PySpark, Spark SQL, Python
  • About Project: This project has taken care two part of migration, Usql_to_Pyspark and ADL Gen1_to_Gen2
  • Notebook workflow also implemented on this project for making generic code in place
  • Master Notebook having business logic and other functionality within the specific script
  • Fileops notebook called by master notebook and perform cleanup and schema conversion related programs.

Architect and Lead Developer

ABC, Hewlett Packard / DXC Technology
09.2014 - 01.2016
  • Data validation framework: Resource Used: ADF, ADB, Data lake Gen1, Web API, SQL, Python
  • About Project: ABC is used for Data validation during transformation of Data through different layers and capture the data mismatch and Pipeline status and displayed on Internal Dashboard
  • ADF used for workflow, and generic python script used for many pipelines
  • Generic python script is driven by metadata table through Java API call
  • Aggregation and CIP Project in Azure/SQLDW with Gen1 ADL: Resource, SQL DW Procedures, Hive, Data Lake Gen1
  • About Project: Aggregation logic in legacy EDW was implemented in Azure with logic embedded in SQL DW Procedures, files in Data Lake Gen1 with different layers like RDS, SDM, CDM were fetched ingest in sqldw with external tables as polybase and loaded into the final table with ABC logic
  • These procedures were later called in ADF with ABC in pipeline level
  • Aggregation were mainly related to item discount, warehouse shipment, item cost and extended cost related logics
  • Roles and responsibilities:
  • Worked extensively in performance tuning in superdome
  • Worked as an SME in Aggregation for AD/FISCAL cost data and sell thru && non sell thru data
  • Developed and implement new ADF pipelines
  • Convert existing U_SQL to PySpark code
  • Experience on GIT for version controller
  • Implement innovative ideas and build customer trust
  • Reduce incidents using python automation program
  • Hands on experience on Data-bricks, PyCharm, Jupiter Notebook, Spyder, ADF, Blob, VM
  • Cleaned, merged, and manipulated Datasets and conducted feature engineering using Pandas
  • POC work on ABC model and Predictive analysis on Store performance accepted by client and implemented successfully
  • CDW/IDW/EDW Project
  • Client AHOLD, The “Commercial Synergy Tier 3 Reporting EDW” project was done by DXC to allow combined reporting of Vendor Funding and Sales data for Delhaize America
  • DXC’s role was to build a new product table that will consist of all products sold by Delhaize America mapped into the Ahold USA Hierarchy and to do aggregation of the Product data and the weekly movement data
  • This aggregated data is generated on the weekly basis and it is used by the report generation team and the business users
  • Responsibilities –
  • Understanding the existing AutoSys jobs and scripts which relates to Business logic and converting the same to make it work in EDW environment
  • Understanding the complex logic business transformation and implement into ETL development and design
  • Regular interaction with the client to review the design
  • Analyzing the Design pattern of various staging tables and Dimension and facts tables
  • Perform performance tuning in the load level to improve the performance
  • Perform object deployments on versioned repositories from QA to Production
  • Worked extensively in performance tuning in superdome
  • Environment:
  • Superdome, oracle, Informatica, Unix

ETL Developer/ Informatica Administrator

Hewlett Packard, CVS Caremark
09.2013 - 10.2014
  • This project is related to Retail energy business, it has been divided into two phases
  • The first phase of the project involves the migration of components of the existing ETL processing, databases and reports that are required to support TSA reporting to the new technology platform
  • Phase 2 the implementation of the remodeled Aurora target architecture will commence for the in-scope areas of the TSA.HP and Aurora have jointly recognized that the re-architecting of the migrated solution needs to be prioritized to focus on the highly-shared data in order to balance the mitigation of the data integrity risks with the timeframe risks
  • Responsibilities –
  • Understanding the existing BODI ETL Business logic and converting the same into Informatica mapping logic in phase 1
  • Understanding the complex logic business transformation and implement into ETL development and design
  • Designing the ETL artifacts accordingly with the business logic and implementing them with minimum complexity
  • Regular interaction with the client to review the design
  • Analyzing the Design pattern of various staging tables and Dimension and facts tables
  • Perform performance tuning in the ETL level to improve the performance
  • Perform object deployments on versioned repositories from QA to Production
  • Setup and administer Power Center security
  • Setup and configure Power Center domain and services
  • Install Power Center software, administer repositories
  • Creating production environments including object migrations
  • Creation and maintenance of Informatica users and privileges
  • Worked on SQL queries to query the Repository DB to find the deviations from Company’s ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions and Workflows
  • Environment:
  • Vertica, Informatica, MicroStrategy Aetna On-Demand Projects
  • Client

Technical Lead

Hewlett Packard
04.2011 - 09.2013
  • CVS Caremark is taking on an enterprise wide initiative to create an enterprise data warehouse using an industry standard healthcare model
  • As such, we have existing source system as Rxclaim, this source system is adding three new feed in edw2 Teradata, these new feeds will be fed to the warehouse with information from Rxclaim to EDW2
  • The main objective of this project is to populate the Rxclaim related data using these three feeds to Edw2 warehouse
  • Data is loaded into Oracle environment and relevant dimension data is built based on claims
  • Same is cascaded to Teradata environment for report processing
  • Responsibilities –
  • Design the solution for individual projects like Aetna Patient Override and Rxclaim Gaps, CDD FULL, Prompt pay phase III FEDB
  • Planning, analyzing, implementing, and documenting strategies related to ETL development and design
  • Perform design and code reviews and do the knowledge transfers with the team
  • Analyzing the Design pattern of various staging tables and Dimension tables
  • Designed the ETL using Shell script, Informatica, PL/SQL, Summary Management, DB Sync tool and Teradata Bteq to load data from file to Teradata via oracle
  • Planning, analyzing, implementing and documenting strategies related to ETL development and design
  • Perform design and code reviews and do the knowledge transfers with the team
  • Achievements:
  • Completed all the project in-time as well as coordinated multiple projects
  • Used several re-usable components and reduced the development time as well as improved performance
  • Implemented project execution standards for better project management and tracking
  • Environment:
  • Oracle, Informatica, Teradata and Shell Scripting
  • EOMS Staging Implementation

Role Technical Lead

CVS Caremark, Hewlett Packard
04.2010 - 04.2011
  • CVS Caremark is taking on an enterprise wide initiative to create an enterprise data warehouse using an industry standard healthcare model
  • As such, EOMS (Enterprise Opportunity Management System) application will be transitioning to use this model
  • This Model is mainly used to migrate the data produced by the EOMS application to the new model structure on EDW1 also to Load data from a XML file to the corresponding 6(Exact Number TBD) staging tables in the EOMSDMA schema
  • In the later phase of this Project the data from the various EOMSDMA schemas will be used to load Denorm tables using various business functionality
  • Using Informatica as an ETL tool and Unix shell scripting the data is migrated
  • Responsibilities –
  • Analyzing the Design pattern of various staging tables and Denorm tables
  • Enhancing the Shell scripts to automate the process of ETL Objects
  • Understanding the Business functionalities in low level design and to create a new design document based on the requirement for Denorm tables
  • Analyzing the data model of Denorm tables at mapping level and preparing the HLD and LLD for each and every mapping
  • Conducting design workshop with our business team
  • Preparing UT, QAT test cases
  • Achievements:
  • Developed Complex etl objects with Improved performance for loading into the dimension and facts tables.

TSA

Aurora Energy Pty LTD

Education

Abnitio- 1 yearPL/SQL -

Oracle – 15 -

Teradata – 7 -

YearsMicroStrategy -2 years -

Skills

  • Top Skills (Technical / Non-Technical skills)
  • Data pipeline design
  • ETL process development
  • Database optimization
  • Data governance implementation
  • Cloud architecture design
  • Cross-functional collaboration
  • Technical documentation creation
  • Microsoft Azure stack, Databricks, PySpark - 8 years
  • Performance tuning methodologies
  • Compliance standards adherence
  • Problem resolution strategies
  • Quality assurance practices
  • Agile project management
  • User needs analysis
  • Data integration
  • Data warehousing
  • Version control
  • API development
  • Relational databases
  • Data modeling
  • Data migration

Timeline

Lead Data Engineer Data Engineer/Architectural Engineer

BeCSE,Delhaize
12.2021 - Current

Lead Developer/Architect

AHOLD NL, Com DXC Technology
01.2021 - 11.2021

Lead Developer/Architect

AHOLD, Com DXC Technology
09.2016 - 03.2021

Architect and Lead Developer

ABC, Hewlett Packard / DXC Technology
09.2014 - 01.2016

ETL Developer/ Informatica Administrator

Hewlett Packard, CVS Caremark
09.2013 - 10.2014

Technical Lead

Hewlett Packard
04.2011 - 09.2013

DXC TECHNOLOGY
04.2010 - Current

Role Technical Lead

CVS Caremark, Hewlett Packard
04.2010 - 04.2011

TSA

Aurora Energy Pty LTD

Abnitio- 1 yearPL/SQL -

Oracle – 15 -

Teradata – 7 -

YearsMicroStrategy -2 years -

Pranabesh Ghosh