Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Ketan Gupta

Ketan Gupta

Alwar

Summary

Results-driven Data Engineer with 2.5+ years of overall experience in Big Data and Data Engineering,

designing and implementing scalable data pipelines and managing end-to-end data warehousing

solutions.Proficient in optimizing ETL processes using Python, modernizing legacy systems, and

leveraging cutting-edge technologies to enhance performance and reduce costs. Adept at driving

impactful projects with measurable Outcomes

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer

Celebal Technologies
Jaipur
02.2023 - Current

Worked as an Data Engineer designing and implementing Big data solutions in the Azure data

space. Expertise in multiple Bigdata Technologies

Client/Project: Commercial Bank

  • Spearheaded the migration of a high-volume legacy Teradata system, handling over 8,000 DDLs and approximately 5,000 ETL scripts as part of the MVP scope, to Azure Databricks in the Azure cloud environment.
  • Developed two major frameworks, Parallel Run and Data Migration, to optimize performance and leverage Databricks capabilities effectively.
  • Developed multiple pre-processors, which do some preprocessing on incremental data according to it’s file format before ingesting data to raw table in Databricks.
  • Transformed Pre-Processors by converting legacy Shell scripts into optimized Python Pyspark code, achieving a 40% reduction in stream runtime.
  • Built robust frameworks for seamless migration handling, including DDL execution, lineage visualization, and reconciliation, supported by operational dashboards for real-time monitoring

Client/Project: Fashion Retail Company

  • Optimized streaming pipelines for analysis and operation dashboards, reducing job completion time from 50- 60 minutes to under 15 minutes and data delay from 2 hours to 2 minutes.
  • Migration of Map-Reduce Java jobs to Java Spark Maven project from HDI to Databricks.
  • Transformed Python/Pandas logic to PySpark, improving distributed computing and reducing computational overhead.
  • The target format is changed from orc,csv to delta.
  • The overall cost and runtime was reduced to 40% after migrating to Databricks.
  • Orchestration of these jobs was migrated from ADF to Airflow

Education

Bachelors of Engineering - Information Technology

MBM Engineering College
Jodhpur, Rajasthan
03-2023

Skills

  • Programming Languages : Python, PySpark, Java, SQL
  • Data Integration: Azure Databricks, Apache Airflow, Azure Data Factory
  • Cloud Platform: Microsoft Azure
  • Database & Storage: Azure SQL, MySQL
  • Concepts: Apache Spark, Data Warehousing, Data Modeling, Spark Streaming, SCD1/SCD2,Medallion Architecture

Certification

  • Databricks Certified Data Engineer Associate
  • Databricks Certified Data Engineer Professional

Timeline

Data Engineer

Celebal Technologies
02.2023 - Current

Bachelors of Engineering - Information Technology

MBM Engineering College
Ketan Gupta