Summary

Overview

Work History

Education

Skills

Certification

Timeline

DILIP KUMAR

Summary

With a proven track record at Deutsche Bank, I excel in architecting and optimizing data solutions using Azure Databricks and PySpark, showcasing my technical prowess and leadership in guiding teams towards excellence. My expertise in Azure Synapse Analytics and a knack for innovative problem-solving have significantly enhanced data processing capabilities.

Overview

years of professional experience

Certification

Work History

Technical Lead – Data Engineering (AI/ML)

SHYFTLABS-PIXEL

12.2025 - Current

Built end-to-end machine learning pipelines including data preprocessing, feature engineering, model training, and evaluation using Python-based ML libraries.
Performed feature engineering and feature scaling (normalization, standardization) to improve model performance and stability across multiple datasets.
Handled high-dimensional data and mitigated the curse of dimensionality using techniques like PCA and feature selection methods.
Trained and evaluated multiple machine learning models (Linear Regression, Logistic Regression, Random Forest, Gradient Boosting, etc.) to identify the best-performing solution.
Applied hyperparameter tuning techniques such as Grid Search and Random Search to optimize model accuracy and generalization.
Implemented model validation strategies including cross-validation and performance metrics (accuracy, precision, recall, F1-score) to ensure robust model performance.
Finalized and productionized machine learning models based on performance, scalability, and business requirements.
Deployed machine learning models as REST APIs using Flask, enabling real-time predictions and integration with downstream applications.
Integrated AI-driven solutions into data pipelines and applications, contributing to intelligent decision-making systems aligned with business use cases.
Gained hands-on experience in applied AI/ML workflows including model lifecycle management, experimentation, and deployment, ensuring scalable and production-ready solutions.

Senior Azure Databricks Engineer @ Pilot Flying J

SHYFTLABS/INNOTOP SULUTIONS PVT LTD

NOIDA

03.2025 - 12.2025

Designed and developed Databricks notebook–based ETL pipelines with multiple dependent jobs, orchestrated using Databricks Workflows, ensuring reliable execution across complex data domains.
Built config-driven, reusable PySpark frameworks to replace multiple hardcoded scripts, significantly reducing job runtime and maintenance overhead while improving scalability.
Optimized large-scale Delta Lake pipelines by introducing partition pruning, window optimizations, and intelligent filtering, resulting in faster job execution and lower cluster compute costs.
Developed hybrid ingestion frameworks to flatten nested JSON, arrays, and structs into relational models, enabling downstream analytics and reporting use cases.
Performed data reconciliation and root-cause analysis across upstream financial and fuel transaction tables, identifying margin discrepancies and ensuring accuracy in business-critical profit calculations.

Senior Data Engineer

Sopra Steria Pvt Ltd

07.2022 - 03.2025

Customer Banking Data Platform (CBDP):

End-to-End Data Pipeline Implementation:

Designed and implemented robust end-to-end data pipelines using Azure Data Factory (ADF) and Azure Databricks, transforming structured and unstructured data from multiple sources into Azure Data Lake Storage (ADLS) to support enterprise data needs.
Designed and implemented Medallion Architecture (Bronze, Silver, Gold layers) in Databricks for scalable and structured data processing.
Developed Silver layer to clean, deduplicate, and standardize data using PySpark transformations.
Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF) and Azure Databricks to transform, process, and integrate structured and unstructured data from various sources into Azure Data Lake Storage (ADLS).

Big Data Processing with PySpark/Databricks/DataFlow/ADF:

Developed and optimized large-scale data processing workflows in Azure Databricks using PySpark to support advanced analytics, reporting, and machine learning use cases.
Configured and managed Azure Synapse Analytics for data warehousing, enabling seamless integration with Power BI for real-time reporting and interactive dashboards
Implemented scalable storage solutions using Azure Blob Storage and ADLS, ensuring data security through role-based access control (RBAC) and encryption, while optimizing performance and cost for big data applications.

OneBaufi:

Developed and managed a Terraform framework, and successfully set up and maintained the Dev, UAT, and Prod environments, including the creation and management of service accounts.
Developed BigQuery solutions with Dataform, leveraged JavaScript in Dataform to define and automate the creation of BigQuery tables, and implemented business logic for complex data transformations.
Automated workflows using Cloud Composer utilized Cloud Composer and Apache Airflow to deploy and manage Python-based DAGs, integrating shell scripts to streamline and automate end-to-end ETL processes.
Automated infrastructure provisioning and environment setup using Terraform, ensuring consistency across DEV, QA, and PROD environments.
Worked on Pub/Sub companion and topics to consume the messages sent by the cloud scheduler, and automated the workload to run and process the data.
Implemented the validation logic of framework from raw and taret side also scheduled along with the worload.
Designed robust data models (star schema, data vault) in BigQuery to support analytics, reporting, and downstream data science use cases.
Collaborated with cross-functional teams (analytics, product, and data science) to deliver data-driven solutions, improving business insights and decision-making.
Established data quality, validation, and monitoring frameworks to ensure high data accuracy, reliability, and governance across pipelines.

iPolice:

Created a POC for the team, visited Paris to have initial-level discussions, and managed to secure the project for offshore, India.
Created PySpark flexible framework with medallic architecture which had many modules related to the flow.
Implemented many PySpark jobs transformations and actions as per the requirement and did the optimizations of spark job parametrization to pass memory, core, executors on fly
Created some of python precheck scrips to run before running the main job
Developed junior staff through targeted coaching and mentoring, improving capabilities and competencies of technical teams
Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
Developed a Spark-based framework to read data from HIVE table and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
Developed shell scripting, automated it and maintained the housekeeping Enterprise Analytical Platform (EAP).

Senior Data Engineer

BDH Society Generale

11.2021 - 07.2022

Company Overview: Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data
Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
Developed a Spark-based framework to read data from HIVE tables and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
Developed shell scripting, automated it and maintained the housekeeping’s
Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data

Data Engineer

OpenText Technologies

02.2019 - 10.2021

Importing and exporting data in HDFS and Hive using Sqoop
In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce
Having knowledge on Hive Data File Format like ORC, PARQUET
Worked with Spark ecosystem using Spark SQL and Spark Scala
Applied Data Vault 2.0 methodology for data modelling in an Enterprise Datawarehouse, handling stages such as Raw Data Vault, Business Vault, and Information Vault
Configured Jenkins Pipelines for code deployment on clusters
Developed shell scripts to run Spark jobs, including handling holiday and weekend scenarios
Utilized Control-M for scheduling Spark jobs and implemented Oozie jobs for email notifications upon job success or failure
Optimized Spark Scala code and Hive query processing for improved performance

Technical Specialist

IBM

08.2016 - 02.2019

Importing and exporting data in HDFS and Hive using Sqoop
In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way
Having knowledge on Hive Data File Format like ORC, PARQUEST
Worked with Spark ecosystem using Spark SQL and Spark Scala

Education

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology

01.2016

Skills

Azure Data Factory
Azure Synapse Analytics
Azure Databricks
ADLS
GCP
Azure Delta Table
BigQuery
Dataform
Pub/Sub
Cloud Composer
Cloud Scheduler
Terraform
Apache Spark
Java
Pyspark
Hive
Sqoop
Cloudera Hadoop
Delta Lake
Oozie
Control-M
Airflow

PL/SQL
SQL
Shell Scripting
Azure Devops
ADLS Gen 2
Cloudera Hadoop (HDFS)
Parquet
AVRO
Python
Github
UNIX
Confluence
Bit bucket
JIRA
Service Now
Spark development
Real-time analytics
Big data processing
Data pipeline design
Git version control
Python programming

Certification

Snowflake SnowPro Core
https://achieve.snowflake.com/80a31be2-9c7d-4b37-b5f5-53d36d738d81#acc.9R0BXWuv
Databricks Certified Data Engineer Professional.
https://credentials.databricks.com/a10dc450-19de-4212-8bec-4634bb52b72d#acc.1rDLzuAt

Timeline

Technical Lead – Data Engineering (AI/ML)

SHYFTLABS-PIXEL

12.2025 - Current

Senior Azure Databricks Engineer @ Pilot Flying J

SHYFTLABS/INNOTOP SULUTIONS PVT LTD

03.2025 - 12.2025

Senior Data Engineer

Sopra Steria Pvt Ltd

07.2022 - 03.2025

Senior Data Engineer

BDH Society Generale

11.2021 - 07.2022

Data Engineer

OpenText Technologies

02.2019 - 10.2021

Technical Specialist

IBM

08.2016 - 02.2019

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology

DILIP KUMAR

Summary

Overview

Work History

Technical Lead – Data Engineering (AI/ML)

Senior Azure Databricks Engineer @ Pilot Flying J

Senior Data Engineer

Senior Data Engineer

Data Engineer

Technical Specialist

Education

B.Tech. - Electrical & Electronics Engineering

Skills

Certification

Timeline

Technical Lead – Data Engineering (AI/ML)

Senior Azure Databricks Engineer @ Pilot Flying J

Senior Data Engineer

Senior Data Engineer

Data Engineer

Technical Specialist

B.Tech. - Electrical & Electronics Engineering

Similar Profiles

MOURYA SMOURYA S

Vinith Kumar GuntupalliVinith Kumar Guntupalli

Deepak Singh BhadouriyaDeepak Singh Bhadouriya

VIVEKANAND BHATVIVEKANAND BHAT