Summary
Overview
Work History
Education
Skills
Timeline
Generic

Deepak Sharma

Gurugram

Summary

Experienced IT professional with 6 years of expertise in ETL, Data Warehousing, AWS Cloud, Business Intelligence, and analytics. Skilled in developing and validating ETL processes using Python, PySpark, and Spark-SQL. Proficient in managing Data Lake environment in AWS and integrating data from sources like Amazon Redshift, AWS RDS, AWS S3, and Athena. Experienced in creating and managing ETL processes using AWS tools such as Glue, Lambda, Step Function, DMS, SQS, SNS, Delta Lake, and Data Lake. Proven track record of implementing real-time data pipelines in AWS for efficient data processing and storage. Strong interpersonal skills demonstrated through successful collaboration with team members and stakeholders in Agile Scrum methodology. Results-driven data engineering professional with solid foundation in designing and maintaining scalable data systems. Expertise in developing efficient ETL processes and ensuring data accuracy, contributing to impactful business insights. Known for strong collaborative skills and ability to adapt to dynamic project requirements, delivering reliable and timely solutions.

Overview

7
7
years of professional experience

Work History

Data Engineer

KPMG
05.2023 - Current
  • Leveraged Python, PySpark, and SQL transformations to design and implement Glue ETL jobs, facilitating the extraction, transformation, and loading of data for various projects.
  • Spearheaded the development of a campaign manager system for a client, enabling targeted SMS and WhatsApp alerts to customers. Managed end-to-end delivery, from requirement gathering to deployment and maintenance.
  • Orchestrated SMS alert functionalities for customer notifications, including booking confirmations, new offers, test drive feedback, and customer satisfaction surveys, optimizing customer engagement and retention.
  • Engineered data pipelines to preprocess customer data, applying diverse transformations to meet specific business requirements, and seamlessly loaded processed data into the S3 gold layer for downstream analytics.
  • Transformed raw data stored in Delta format into refined datasets, ensuring data quality and consistency while streamlining data processing efficiency.
  • Led the migration of a pandas-based data processing module to PySpark, capitalizing on distributed computing capabilities to enhance scalability and performance, enabling efficient handling of large-scale datasets.
  • Optimized PySpark code for improved performance, leveraging distributed computing features and parallel processing capabilities to maximize efficiency and reduce processing time.
  • Developed Glue jobs to process vast volumes of data, applying custom transformations as per business requirements, and orchestrating seamless data loading into Redshift for advanced analytics.
  • Played a pivotal role in Redshift performance optimization initiatives, fine-tuning queries, and optimizing data warehouse configurations to enhance query performance and overall system efficiency.
  • Collaborated closely with clients to understand their unique business requirements, translating them into technical solutions, and delivering tailored data solutions to meet their objectives effectively.
  • Built and managed real-time data streaming pipelines using AWS Kinesis.
  • Automated data processing workflows leveraging AWS Lambda for event-driven architecture.
  • Integrated AWS SNS for real-time notifications and messaging.
  • Utilized DynamoDB for scalable and low-latency storage of streaming data.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.

Data Engineer

Impressico
08.2021 - 05.2023
  • Used Python, PySpark and SQL transformations to create ETL jobs.
  • Performed data ingestion, data integration, ETL, data Analysis and data validation using Python, PySpark, SQL, AWS S3, AWS DMS, Athena and GLUE.
  • Experienced towards design & creation of Data Mart in AWS for the reporting needs of the project while analyzing and integrating the data from different data sources.
  • Worked on data migration task where source is AWS RDS and destination is AWS S3 buckets by using AWS DMS.
  • Worked on AWS Glue to extract, transform and load the data into AWS Redshift.
  • Developing Python to process various datasets to create an ETL having transformations including filter conditions, missing values, dropping duplicates, validating conditions, find Duplicates in Data, join two datasets, find missing records and gap data.
  • Hands on with SQL to write Scripts/queries for analyzing the data present in DW.
  • Performed data transformation by using Pandas and NumPy.
  • Built and Tested Server less applications using AWS Lambda integrating API, S3, AWS Redshift, Transforming data from various formats.
  • Creating buckets and loading data into AmazonS3 data lake.
  • Creating tables and querying data in Amazon Athena and transforming data via spark.
  • Used Python and PySpark transformations to create and test ETL jobs.
  • Orchestrated and integrated using AWS Step Functions.
  • Created Event Bridge rule to schedule job.
  • Used Secrete manager to store dB credentials.
  • Worked on to monitor Production pipelines and fix issue.
  • Worked on to fix critical Production bugs.
  • Worked on Prod release stories and deployment activity.
  • Used terraform for AWS infra deployment and automate various infrastructure tasks.
  • Utilizing PostgreSQL as a config DB, built tables, added & extracted data from DB programmatically using Lambda & Glue Jobs.

Data Engineer

HCL
11.2018 - 07.2021
  • Used Python, PySpark and Spark-SQL transformations to create ETL jobs.
  • Performed data ingestion, data integration, ETL, data Analysis and data validation using Python, PySpark, SQL, AWS S3, AWS DMS, Athena and GLUE.
  • Used Amazon S3 as data lake.
  • Worked on end-to-end data validation and testing on different source system/DB.
  • Worked on creating Redshift spectrum tables and verify spectrum tables function properly.
  • Created AWS glue script for automation of data validation from source to target for different source system.
  • Worked on Data validation for Fact and Dim tables and make sure that Data should be loaded correctly in DIM and Fact tables as per the defined logic/transformation rules.
  • Validate data from Source to target without any missing records and data gaps.
  • Worked on to create view and external tables.
  • Hands on with SQL to write Scripts/queries for analyzing the data present in mart.
  • Performed data transformation by using Pandas and NumPy.
  • Hands on experienced in SQL.
  • Managed security and implemented policies using IAM.

Education

Master's Degree - Computer Applications

Maharshi Dayanand University
07-2021

Bachelor's Degree - Computer Applications

Sikkim Manipal University
07-2016

Skills

  • Python
  • PySpark
  • SQL
  • Amazon Redshift
  • AWS Lambda
  • AWS Glue
  • AWS Step Function
  • AWS EMR
  • Athena
  • AWS Event Bridge
  • SNS
  • GitLab
  • Delta Lake
  • AWS RDS
  • AWS S3
  • MySQL
  • Microsoft SQL Server
  • AWS CloudWatch
  • Jira
  • Airflow
  • Data warehousing

Timeline

Data Engineer

KPMG
05.2023 - Current

Data Engineer

Impressico
08.2021 - 05.2023

Data Engineer

HCL
11.2018 - 07.2021

Master's Degree - Computer Applications

Maharshi Dayanand University

Bachelor's Degree - Computer Applications

Sikkim Manipal University
Deepak Sharma