Ketan Gupta

Celebal Technologies

Jaipur

02.2023 - Current

Worked as an Data Engineer designing and implementing Big data solutions in the Azure data

space. Expertise in multiple Bigdata Technologies

Client/Project: Commercial Bank

Spearheaded the migration of a high-volume legacy Teradata system, handling over 8,000 DDLs and approximately 5,000 ETL scripts as part of the MVP scope, to Azure Databricks in the Azure cloud environment.
Developed two major frameworks, Parallel Run and Data Migration, to optimize performance and leverage Databricks capabilities effectively.
Developed multiple pre-processors, which do some preprocessing on incremental data according to it’s file format before ingesting data to raw table in Databricks.
Transformed Pre-Processors by converting legacy Shell scripts into optimized Python Pyspark code, achieving a 40% reduction in stream runtime.
Built robust frameworks for seamless migration handling, including DDL execution, lineage visualization, and reconciliation, supported by operational dashboards for real-time monitoring

Client/Project: Fashion Retail Company

Optimized streaming pipelines for analysis and operation dashboards, reducing job completion time from 50- 60 minutes to under 15 minutes and data delay from 2 hours to 2 minutes.
Migration of Map-Reduce Java jobs to Java Spark Maven project from HDI to Databricks.
Transformed Python/Pandas logic to PySpark, improving distributed computing and reducing computational overhead.
The target format is changed from orc,csv to delta.
The overall cost and runtime was reduced to 40% after migrating to Databricks.
Orchestration of these jobs was migrated from ADF to Airflow

Similar Profiles