Worked as an Data Engineer designing and implementing Big data solutions in the Azure data
space. Expertise in multiple Bigdata Technologies
Client/Project: Commercial Bank
- Spearheaded the migration of a high-volume legacy Teradata system, handling over 8,000 DDLs and approximately 5,000 ETL scripts as part of the MVP scope, to Azure Databricks in the Azure cloud environment.
- Developed two major frameworks, Parallel Run and Data Migration, to optimize performance and leverage Databricks capabilities effectively.
- Developed multiple pre-processors, which do some preprocessing on incremental data according to it’s file format before ingesting data to raw table in Databricks.
- Transformed Pre-Processors by converting legacy Shell scripts into optimized Python Pyspark code, achieving a 40% reduction in stream runtime.
- Built robust frameworks for seamless migration handling, including DDL execution, lineage visualization, and reconciliation, supported by operational dashboards for real-time monitoring
Client/Project: Fashion Retail Company
- Optimized streaming pipelines for analysis and operation dashboards, reducing job completion time from 50- 60 minutes to under 15 minutes and data delay from 2 hours to 2 minutes.
- Migration of Map-Reduce Java jobs to Java Spark Maven project from HDI to Databricks.
- Transformed Python/Pandas logic to PySpark, improving distributed computing and reducing computational overhead.
- The target format is changed from orc,csv to delta.
- The overall cost and runtime was reduced to 40% after migrating to Databricks.
- Orchestration of these jobs was migrated from ADF to Airflow