Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

DILIP KUMAR

Summary

With a proven track record at Deutsche Bank, I excel in architecting and optimizing data solutions using Azure Databricks and PySpark, showcasing my technical prowess and leadership in guiding teams towards excellence. My expertise in Azure Synapse Analytics and a knack for innovative problem-solving have significantly enhanced data processing capabilities.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Technical Lead – Data Engineering (AI/ML)

SHYFTLABS-PIXEL
12.2025 - Current
  • Built end-to-end machine learning pipelines including data preprocessing, feature engineering, model training, and evaluation using Python-based ML libraries.
  • Performed feature engineering and feature scaling (normalization, standardization) to improve model performance and stability across multiple datasets.
  • Handled high-dimensional data and mitigated the curse of dimensionality using techniques like PCA and feature selection methods.
  • Trained and evaluated multiple machine learning models (Linear Regression, Logistic Regression, Random Forest, Gradient Boosting, etc.) to identify the best-performing solution.
  • Applied hyperparameter tuning techniques such as Grid Search and Random Search to optimize model accuracy and generalization.
  • Implemented model validation strategies including cross-validation and performance metrics (accuracy, precision, recall, F1-score) to ensure robust model performance.
  • Finalized and productionized machine learning models based on performance, scalability, and business requirements.
  • Deployed machine learning models as REST APIs using Flask, enabling real-time predictions and integration with downstream applications.
  • Integrated AI-driven solutions into data pipelines and applications, contributing to intelligent decision-making systems aligned with business use cases.
  • Gained hands-on experience in applied AI/ML workflows including model lifecycle management, experimentation, and deployment, ensuring scalable and production-ready solutions.

Senior Azure Databricks Engineer @ Pilot Flying J

SHYFTLABS/INNOTOP SULUTIONS PVT LTD
NOIDA
03.2025 - 12.2025
  • Designed and developed Databricks notebook–based ETL pipelines with multiple dependent jobs, orchestrated using Databricks Workflows, ensuring reliable execution across complex data domains.
  • Built config-driven, reusable PySpark frameworks to replace multiple hardcoded scripts, significantly reducing job runtime and maintenance overhead while improving scalability.
  • Optimized large-scale Delta Lake pipelines by introducing partition pruning, window optimizations, and intelligent filtering, resulting in faster job execution and lower cluster compute costs.
  • Developed hybrid ingestion frameworks to flatten nested JSON, arrays, and structs into relational models, enabling downstream analytics and reporting use cases.
  • Performed data reconciliation and root-cause analysis across upstream financial and fuel transaction tables, identifying margin discrepancies and ensuring accuracy in business-critical profit calculations.

Senior Data Engineer

Sopra Steria Pvt Ltd
07.2022 - 03.2025

Customer Banking Data Platform (CBDP):

End-to-End Data Pipeline Implementation:

  • Designed and implemented robust end-to-end data pipelines using Azure Data Factory (ADF) and Azure Databricks, transforming structured and unstructured data from multiple sources into Azure Data Lake Storage (ADLS) to support enterprise data needs.
  • Designed and implemented Medallion Architecture (Bronze, Silver, Gold layers) in Databricks for scalable and structured data processing.
  • Developed Silver layer to clean, deduplicate, and standardize data using PySpark transformations.
  • Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF) and Azure Databricks to transform, process, and integrate structured and unstructured data from various sources into Azure Data Lake Storage (ADLS).

Big Data Processing with PySpark/Databricks/DataFlow/ADF:

  • Developed and optimized large-scale data processing workflows in Azure Databricks using PySpark to support advanced analytics, reporting, and machine learning use cases.
  • Configured and managed Azure Synapse Analytics for data warehousing, enabling seamless integration with Power BI for real-time reporting and interactive dashboards
  • Implemented scalable storage solutions using Azure Blob Storage and ADLS, ensuring data security through role-based access control (RBAC) and encryption, while optimizing performance and cost for big data applications.

OneBaufi:

  • Developed and managed a Terraform framework, and successfully set up and maintained the Dev, UAT, and Prod environments, including the creation and management of service accounts.
  • Developed BigQuery solutions with Dataform, leveraged JavaScript in Dataform to define and automate the creation of BigQuery tables, and implemented business logic for complex data transformations.
  • Automated workflows using Cloud Composer utilized Cloud Composer and Apache Airflow to deploy and manage Python-based DAGs, integrating shell scripts to streamline and automate end-to-end ETL processes.
  • Automated infrastructure provisioning and environment setup using Terraform, ensuring consistency across DEV, QA, and PROD environments.
  • Worked on Pub/Sub companion and topics to consume the messages sent by the cloud scheduler, and automated the workload to run and process the data.
  • Implemented the validation logic of framework from raw and taret side also scheduled along with the worload.
  • Designed robust data models (star schema, data vault) in BigQuery to support analytics, reporting, and downstream data science use cases.
  • Collaborated with cross-functional teams (analytics, product, and data science) to deliver data-driven solutions, improving business insights and decision-making.
  • Established data quality, validation, and monitoring frameworks to ensure high data accuracy, reliability, and governance across pipelines.

iPolice:

  • Created a POC for the team, visited Paris to have initial-level discussions, and managed to secure the project for offshore, India.
  • Created PySpark flexible framework with medallic architecture which had many modules related to the flow.
  • Implemented many PySpark jobs transformations and actions as per the requirement and did the optimizations of spark job parametrization to pass memory, core, executors on fly
  • Created some of python precheck scrips to run before running the main job
  • Developed junior staff through targeted coaching and mentoring, improving capabilities and competencies of technical teams
  • Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
  • Developed a Spark-based framework to read data from HIVE table and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
  • Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
  • Developed shell scripting, automated it and maintained the housekeeping Enterprise Analytical Platform (EAP).

Senior Data Engineer

BDH Society Generale
11.2021 - 07.2022
  • Company Overview: Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data
  • Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
  • Developed a Spark-based framework to read data from HIVE tables and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
  • Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
  • Developed shell scripting, automated it and maintained the housekeeping’s
  • Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data

Data Engineer

OpenText Technologies
02.2019 - 10.2021
  • Importing and exporting data in HDFS and Hive using Sqoop
  • In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce
  • Having knowledge on Hive Data File Format like ORC, PARQUET
  • Worked with Spark ecosystem using Spark SQL and Spark Scala
  • Applied Data Vault 2.0 methodology for data modelling in an Enterprise Datawarehouse, handling stages such as Raw Data Vault, Business Vault, and Information Vault
  • Configured Jenkins Pipelines for code deployment on clusters
  • Developed shell scripts to run Spark jobs, including handling holiday and weekend scenarios
  • Utilized Control-M for scheduling Spark jobs and implemented Oozie jobs for email notifications upon job success or failure
  • Optimized Spark Scala code and Hive query processing for improved performance

Technical Specialist

IBM
08.2016 - 02.2019
  • Importing and exporting data in HDFS and Hive using Sqoop
  • In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way
  • Having knowledge on Hive Data File Format like ORC, PARQUEST
  • Worked with Spark ecosystem using Spark SQL and Spark Scala

Education

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology
01.2016

Skills

  • Azure Data Factory
  • Azure Synapse Analytics
  • Azure Databricks
  • ADLS
  • GCP
  • Azure Delta Table
  • BigQuery
  • Dataform
  • Pub/Sub
  • Cloud Composer
  • Cloud Scheduler
  • Terraform
  • Apache Spark
  • Java
  • Pyspark
  • Hive
  • Sqoop
  • Cloudera Hadoop
  • Delta Lake
  • Oozie
  • Control-M
  • Airflow
  • PL/SQL
  • SQL
  • Shell Scripting
  • Azure Devops
  • ADLS Gen 2
  • Cloudera Hadoop (HDFS)
  • Parquet
  • AVRO
  • Python
  • Github
  • UNIX
  • Confluence
  • Bit bucket
  • JIRA
  • Service Now
  • Spark development
  • Real-time analytics
  • Big data processing
  • Data pipeline design
  • Git version control
  • Python programming

Certification

  • Snowflake SnowPro Core
  • https://achieve.snowflake.com/80a31be2-9c7d-4b37-b5f5-53d36d738d81#acc.9R0BXWuv
  • Databricks Certified Data Engineer Professional.
  • https://credentials.databricks.com/a10dc450-19de-4212-8bec-4634bb52b72d#acc.1rDLzuAt

Timeline

Technical Lead – Data Engineering (AI/ML)

SHYFTLABS-PIXEL
12.2025 - Current

Senior Azure Databricks Engineer @ Pilot Flying J

SHYFTLABS/INNOTOP SULUTIONS PVT LTD
03.2025 - 12.2025

Senior Data Engineer

Sopra Steria Pvt Ltd
07.2022 - 03.2025

Senior Data Engineer

BDH Society Generale
11.2021 - 07.2022

Data Engineer

OpenText Technologies
02.2019 - 10.2021

Technical Specialist

IBM
08.2016 - 02.2019

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology
DILIP KUMAR