Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Prasoon Kumar

Bengaluru

Summary

Seasoned IT professional with over 18 years of experience in Information Technology, specializing in Data Lake architectures using Databricks, Big Data ecosystems, DBMS, Data Warehousing, and ETL solutions. Demonstrated expertise across the full project lifecycle—including business analysis, requirement gathering, architecture and design, development, testing, training, and implementation—across diverse industries such as digital marketing , telecommunications, banking, finance and retail . Known for maintaining consistent direct client engagement, providing strategic and technical leadership throughout complex data-driven projects.

Possesses in-depth knowledge of AdTech and MarTech ecosystems, including campaign data pipelines,audience segmentation,attribution/reattribution modelling. Experienced in integrating marketing automation systems, Customer Data Platforms (CDPs),Audience Data Management (ADP) and demand-side platforms (DSPs) with robust data pipelines, driving data-driven decision-making for marketing and advertising ecosystems.

Responsible for maintaining consistent direct product owner engagement, providing strategic and technical leadership throughout complex data-driven projects.

Overview

18
18
years of professional experience
1
1
Certification

Work History

Director, Data Engineering

Epsilon
07.2025 - Current

Company Overview: Epsilon, Publicis Groupe Company

Responsible for :

  • Led the strategic design and development of scalable data engineering solutions across Databricks and Greenplum, integrating Notebooks and workflows for ETL orchestration, implementing CI/CD pipelines using Databricks Asset Bundles, and establishing robust data lineage, governance, and security through Unity Catalog.
  • Demonstrated deep technical expertise in ETL, SQL, PySpark, and modern data engineering practices, while serving as the primary point of contact for all data-related initiatives—including architecture design, solution development, and operational excellence.
  • Architected and managed high-performance ETL/ELT pipelines across Databricks and traditional databases, integrating diverse data sources to power business intelligence and operational reporting.
  • Built scalable data infrastructure, including Data Lakes, Lakehouses, and Data Warehouses, ensuring optimized data management, performance, and broad accessibility across the organization.
  • Enabled the team to deliver robust and scalable data pipelines by designing and deploying Apache Airflow DAGs, significantly improving the efficiency and reliability of ETL workflow orchestration.
  • Architected and developed an analytics platform (CIP) on AWS, enabling insights into consumer sentiment across social media and search engines using data from Netbase and Google AdWords.
  • Ingested and processed large-scale social and web content, applying machine learning models in AWS SageMaker to accurately group user-generated terms with spelling variations, paraphrasing, and rewording, using supervised and unsupervised learning techniques.
  • Applied K-means clustering and PCA for dimensionality reduction and high-variance identification, followed by cosine similarity scoring for cluster merging and normalized volume ratio calculation.
  • Performed automated cluster labeling using Gensim LDA and Markov Chains, then trained a labeled dataset on a Deep Learning Ensemble Model (Normalized CNN + BiGRU).
  • Utilized Docker images via AWS ECS to package ML models, generate artifacts, and deploy real-time endpoints.
  • Provided strategic leadership and mentorship to a high-performing team of data engineers, aligning their initiatives with organizational objectives and driving measurable business outcomes through innovative, data-driven solutions.

Manager Data Engineering

Epsilon
09.2022 - 07.2025

Data Solution Owner

ABInBev
04.2018 - 05.2019
  • Company Overview: GCC Bangalore


Responsible for :

  • Building and enhancing EDH Capability by Architecting, Designing, developing reusable ETL frameworks In Talend.
  • Leading and governing Finance, People and Supply Chain streams end to end to ensure high quality deliveries and provide technical mentorship with driving process improvements, standardization and standard methodologies.
  • Designing Audit Balance Control framework to orchestrate entire ETL pipeline.
  • Designing and creating ETL Pipelines to ingest data in Data Lake using Hive (using spark execution engine) and ADLS, used ORC file format and PySpark.
  • Creating ETL pipeline and using Python module to transform huge amount of data.
  • Also created translation framework in python using pandas, google trans etc.
  • Designed Hive tables to store very huge history data, which is consumed by data scientist for exploration, analytics ad-hoc queries.
  • Defining Data Quality rules, Data lineage, data catalog for multiple systems.
  • Designing Relational and Dimensional data model in Erwin by integrating multiple system like Sharp, Navigate, Click etc.
  • Identify Critical Data Elements (CDE’s) from Enterprise Data Model.

Senior Data Engineer

GE Digital
01.2017 - 04.2018
  • Company Overview: GE Digital


Senior ETL Expert

Markem-Imaje
02.2016 - 01.2017
  • Company Overview: Markem-Image(Machinary)

ETL Lead

ITC Infotech India Pvt. Ltd.
10.2011 - 01.2016

Associate Technical Consultant

Teradata India Pvt. Ltd.
07.2010 - 10.2011

Lead Data Engineer

Epsilon
05.2019 - 09.2022

Software Engineer

HSBC Software Development
11.2009 - 07.2010

Associate System Engineer

IBM India Pvt. Ltd.
06.2007 - 11.2009

Education

Master of Science - Data Science

Deakin University
11.2024

Postgraduate - Data Science and Business Analytics

Grate Lakes Executive Learning
03.2021

B.E. -

Manipal Institute of Technology
01.2007

Skills

ETL Tools:

  • IBM Datastage
  • Talend Data Integration
  • Databricks

RDBMS:

  • DB2
  • Oracle 11i
  • Teradata-12
  • SQL Server
  • Other Languages/ Technologies / Platforms /DevOps:
  • Sql
  • Unix Shell Scripting
  • Teradata
  • Autosys
  • Tivoli (TWS)
  • Python 36
  • SQL DW
  • PL/SQL
  • Docker
  • Kubernetes
  • Airflow
  • AWS Amazon(ECR,EKS,SageMaker,Lambda,S3)
  • Azure(ADLS,BlobStorage)
  • Big Data Technologies:
  • HIVE
  • HDFS
  • SPARK

Statistics & Data Science:

  • Descriptive Statistics
  • Probability & Probability Distributions
  • Hypothesis Testing
  • ANOVA
  • Principal Component Analysis
  • Exploratory Data Analysis
  • Data Mining
  • Clustering
  • CART
  • Random Forest
  • Artificial Neural Networks
  • Predictive Analytics
  • Linear Regression
  • Logistic Regression
  • Linear Discriminant Analysis

Operating Systems:

  • Windows NT/200x Server
  • Unix (AIX)

Reporting Tools:

  • BusinessObject-XI
  • QlikView version

Certification

  • Teradata 12 Certified Professional
  • 212 Degree Leadership program from Epsilon
  • Manager’s Development Program from Epsilon
  • IBM Certified Database Associate (Universal Database V8.1 Family)
  • IBM Web Sphere Certified for DataStage V8.0

Timeline

Director, Data Engineering

Epsilon
07.2025 - Current

Manager Data Engineering

Epsilon
09.2022 - 07.2025

Lead Data Engineer

Epsilon
05.2019 - 09.2022

Data Solution Owner

ABInBev
04.2018 - 05.2019

Senior Data Engineer

GE Digital
01.2017 - 04.2018

Senior ETL Expert

Markem-Imaje
02.2016 - 01.2017

ETL Lead

ITC Infotech India Pvt. Ltd.
10.2011 - 01.2016

Associate Technical Consultant

Teradata India Pvt. Ltd.
07.2010 - 10.2011

Software Engineer

HSBC Software Development
11.2009 - 07.2010

Associate System Engineer

IBM India Pvt. Ltd.
06.2007 - 11.2009

Postgraduate - Data Science and Business Analytics

Grate Lakes Executive Learning

B.E. -

Manipal Institute of Technology

Master of Science - Data Science

Deakin University
Prasoon Kumar