Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
DILIP KUMAR

DILIP KUMAR

Summary

With a proven track record at Deutsche Bank, I excel in architecting and optimizing data solutions using Azure Databricks and PySpark, showcasing my technical prowess and leadership in guiding teams towards excellence. My expertise in Azure Synapse Analytics and a knack for innovative problem-solving have significantly enhanced data processing capabilities.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Deutsche Bank
07.2022 - Current

Customer Banking Data Plateform(CBDP):

  • Company Overview: Enterprise Analytical Platform (EAP)
  • Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF) and Azure Databricks to transform, process, and integrate structured and unstructured data from various sources into Azure Data Lake Storage (ADLS)
  • Developed and optimized large-scale data processing workflows in Azure Databricks using PySpark to support advanced analytics, reporting, and machine learning use cases
  • Configured and managed Azure Synapse Analytics for data warehousing, enabling seamless integration with Power BI for real-time reporting and interactive dashboards
  • Implemented scalable storage solutions using Azure Blob Storage and ADLS, ensuring data security through role-based access control (RBAC) and encryption, while optimizing performance and cost for big data applications.

OneBaufi:

  • Developed/Managed Terraform framework and successfully set up and maintained (Dev, UAT, Prod) environment, including the creation and management of service accounts
  • Developed Big Query solutions with Dataform leveraged JavaScript in Dataform to define and automate the creation of Big Query tables and implement business logic for complex data transformations
  • Automated workflows using Cloud Composer utilized Cloud Composer and Apache Airflow to deploy and manage Python-based DAGs, integrating shell scripts to streamline and automate end-to-end ETL processes.
  • Worked on Pub/Sub companion and Topics to consume the mesage sent by the cloud scheduler and automated the workload to run and process the data.
  • Implemented the validation logic of framework from raw and taret side also scheduled along with the worload.

iPolice:

  • Created a POC for the team, visited Paris to have initial-level discussions, and managed to secure the project for offshore India.
  • Designs of the project flow with the senior-level architect.
  • Created PySpark flexible framework with metallic architecture which had many modules
  • Implemented many PySpark jobs transformations and actions as per the requirement and did the optimizations of spark job parametrization to pass memory, core, executors on fly
  • Created some of python precheck scrips to run before running the main job
  • Developed junior staff through targeted coaching and mentoring, improving capabilities and competencies of technical teams
  • Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
  • Developed a Spark-based framework to read data from HIVE table and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
  • Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
  • Developed shell scripting, automated it and maintained the housekeeping
  • Enterprise Analytical Platform (EAP)

Senior Data Engineer

BDH Society Generale
11.2021 - 07.2022
  • Company Overview: Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data
  • Established a Cloudera-based Hadoop ecosystem to offer a versatile data integration solution for various source and target systems
  • Developed a Spark-based framework to read data from HIVE tables and produce files in multiple pre-defined formats, including creating and loading Hive tables for testing
  • Implemented data pipelines using Control-M and Shell Scripts, optimized Spark jobs, and configured Control-M for job scheduling with email notifications for failures
  • Developed shell scripting, automated it and maintained the housekeeping’s
  • Big Data Hub the new Data Platform for the Commercial Banking Tribe Domain fulfilling multiple Business requirements related to Data

Data Engineer

OpenText Technologies
02.2019 - 04.2021
  • Importing and exporting data in HDFS and Hive using Sqoop
  • In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce
  • Having knowledge on Hive Data File Format like ORC, PARQUET
  • Worked with Spark ecosystem using Spark SQL and Spark Scala
  • Applied Data Vault 2.0 methodology for data modelling in an Enterprise Datawarehouse, handling stages such as Raw Data Vault, Business Vault, and Information Vault
  • Configured Jenkins Pipelines for code deployment on clusters
  • Developed shell scripts to run Spark jobs, including handling holiday and weekend scenarios
  • Utilized Control-M for scheduling Spark jobs and implemented Oozie jobs for email notifications upon job success or failure
  • Optimized Spark Scala code and Hive query processing for improved performance

Technical Specialist

IBM
08.2016 - 02.2019
  • Importing and exporting data in HDFS and Hive using Sqoop
  • In-depth knowledge on writing query for HIVE, related to partitioning and bucketing
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way
  • Having knowledge on Hive Data File Format like ORC, PARQUEST
  • Worked with Spark ecosystem using Spark SQL and Spark Scala

Education

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology
01.2016

Skills

  • Azure Data Factory
  • Azure Synapse Analytics
  • Azure Databricks
  • ADLS
  • Azure Delta Table
  • BigQuery
  • Dataform
  • Pub/Sub
  • Cloud Composer
  • Cloud Scheduler
  • Terraform
  • Apache Spark
  • Java
  • Pyspark
  • Hive
  • Sqoop
  • Cloudera Hadoop
  • Delta Lake
  • Oozie
  • Control-M
  • Airflow
  • PL/SQL
  • SQL
  • Shell Scripting
  • Azure Devops
  • ADLS Gen 2
  • Cloudera Hadoop (HDFS)
  • Parquet
  • AVRO
  • Python
  • Github
  • UNIX
  • Confluence
  • Bit bucket
  • JIRA
  • Service Now
  • Spark development
  • Real-time analytics
  • Big data processing
  • Data pipeline design
  • Git version control
  • Python programming

Certification

  • Snowflake SnowPro Core
  • Azure Data Engineer Associate (DP-203)

Timeline

Senior Data Engineer

Deutsche Bank
07.2022 - Current

Senior Data Engineer

BDH Society Generale
11.2021 - 07.2022

Data Engineer

OpenText Technologies
02.2019 - 04.2021

Technical Specialist

IBM
08.2016 - 02.2019

B.Tech. - Electrical & Electronics Engineering

Galgotai’s College of Engineering and technology
DILIP KUMAR