Summary
Overview
Work History
Education
Skills
Timeline
Generic

Abhilash Sharma

Senior Data Engineer
Gwalior,MP

Summary

I am a Senior Data Engineer with 5.5 years of experience in ETL and ELT processes, and building Big Data systems to provide Unified Analytics Platforms (Batch & Streaming), primarily focusing on analytics. My expertise includes Python, PySpark, Spark, AWS, SQL, Kubernetes, and Docker, with hands-on experience in Databricks, Azure, HDFS, Hadoop, and Hive. I specialize in developing efficient data pipelines and processing large-scale datasets.

I have played key roles in projects like Gupshup, contributing to analytics development, and Purple Finance, managing ETL processes. My work is centered on automation, optimization, and driving data-driven insights.

Overview

5
5
years of professional experience

Work History

Senior Data Engineer

Purple Finance
07.2024 - Current

Description: Automated ETL pipeline for extracting data from MySQL (LMS, LOS), transforming it, and loading into Amazon S3 and PostgreSQL. Implemented data validation, materialized views for BI tools, and integrated Kafka for real-time data processing.

Responsibilities:

  • Developed ETL pipeline using AWS Glue for MySQL to PostgreSQL data migration.
  • Built materialized views in PostgreSQL to enhance BI querying.
  • Integrated Kafka for real-time data streaming.
  • Ensured data integrity and optimized ETL performance.

Tech Stack: AWS Glue, Amazon S3, PostgreSQL, Kafka, MySQL.

Data Engineer

GupShup
01.2020 - 06.2024

Description:
Led the development and optimization of data pipelines and materialized views within Gupshup’s analytics team, providing an analytics platform for batch and streaming data. Focused on SQL performance tuning in PostgreSQL and created custom scripts for system monitoring. Contributed to establishing a scalable data warehouse using Amazon Redshift and Redshift Spectrum, facilitating seamless querying of S3-stored data. Utilized Kubernetes for managing containerized applications, ensuring efficient deployment and scaling. Integrated multiple AWS services for real-time data streaming and processing.

Responsibilities:
• Architected and optimized ETL pipelines using PostgreSQL, Flink, PySpark, and Python to improve data processing efficiency for batch and streaming workloads.
• Developed a robust data warehouse infrastructure on Amazon Redshift, leveraging Redshift Spectrum for handling large datasets stored in S3, supporting analytics needs.
• Implemented custom monitoring solutions for pipeline reliability, ensuring high availability and performance.
• Leveraged Kubernetes for the automation of deployment, scaling, and management of containerized applications, enhancing operational workflows.
• Managed real-time data ingestion and processing with AWS Kinesis, ensuring timely and efficient data availability for analytics.
• Collaborated with the DevOps team to streamline automation tasks using AWS Glue, EKS, and other AWS services, enhancing the overall efficiency of the data platform.
• Worked on integrating both batch and streaming data to support various analytics use cases, providing actionable insights to business stakeholders.
• Contributed to designing and developing data models, materialized views, and dashboards to present data insights clearly.
Tech Stack: PostgreSQL, Flink, PySpark, Python, AWS S3, AWS Redshift, AWS ECR, AWS EKS, AWS MKS, AWS Kinesis, AWS IAM, AWS Glue.

Data Engineer

Harappa Education
Noida, UP
09.2019 - 04.2020

Description:
The Harappa application stores data in MongoDB Atlas, which requires cleaning and filtering. The client's requirement was to transform and store this data in RDS using a star schema format.
Responsibilities:
• Developed a data pipeline to extract data from MongoDB Atlas and load it into RDS in a star schema format.
• Implemented a trigger in MongoDB Atlas to incrementally load data to S3.
• Created a Python script to load historical data from MongoDB Atlas to S3.
• Used AWS Glue to transfer data from S3 to RDS, configuring crawlers, jobs, and workflows.
◦ Developed a PySpark job to clean and load data from S3 to RDS.
◦ Set up a daily workflow for automated data processing.
• Designed RDS tables using a star schema (fact and dimension tables).
• Created SQL triggers in RDS to automate data updates in the main table.

Education

MASTER COMPUTER APPLICATION -

NATIONAL INSTITUTE OF TECHNOLOGY
Raipur
04.2001 -

Skills

Python programming

undefined

Timeline

Senior Data Engineer

Purple Finance
07.2024 - Current

Data Engineer

GupShup
01.2020 - 06.2024

Data Engineer

Harappa Education
09.2019 - 04.2020

MASTER COMPUTER APPLICATION -

NATIONAL INSTITUTE OF TECHNOLOGY
04.2001 -
Abhilash SharmaSenior Data Engineer