Summary
Overview
Work History
Education
Skills
Languages
Project Highlights
Certification
Timeline
Generic
Deepak Mishra

Deepak Mishra

Noida

Summary

Experienced and results-driven Data Engineer with over 8 years of expertise in architecting and implementing robust data pipelines and scalable solutions on Azure and AWS platforms. Proven hands-on proficiency in Azure Databricks, ADF, Apache Airflow, PySpark, Scala, and SQL. Skilled in CI/CD practices using Azure DevOps and GitHub Actions, with a strong focus on ETL/ELT processing, data quality, and performance optimization. Adept in cross-functional collaboration, delivering modern data solutions that support insightful business decision-making. Beyond delivery, actively engaged in RFP responses, pre-sales solutioning, and client proposal support—partnering with sales and architecture teams to design secure, scalable, and cost-efficient data platforms.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Data Engineer

Capgemini IT Services
Noida
03.2024 - Current
  • Designed and implemented a reusable data ingestion framework using Azure Databricks, ADF, and Apache Airflow.
  • Migrated on-premises Talend ETL processes to modern cloud-based pipelines.
  • Developed scalable PySpark modules for batch and streaming data ingestion.
  • Collaborated with cross-functional teams to deliver efficient, production-grade data workflows.
  • Built data pipelines to transform and load data into Delta Lake formats on Azure Data Lake Storage Gen2.
  • Tuned Apache Spark jobs for performance by optimizing partitioning, caching, and broadcast joins.
  • Developed modular and reusable code in Scala and PySpark for data transformation.
  • Implemented SQL-based logic to support data quality rules, aggregations, and business logic.
  • Led the migration of legacy ETL workflows from Talend to AWS EMR using Apache Spark and PySpark, improving scalability and reducing processing time by over 40%.
  • Designed a reusable ETL framework to dynamically orchestrate jobs using Apache Airflow and EMR step functions.
  • Implemented robust data validation and logging mechanisms, ensuring traceability across S3-based ingestion pipelines.
  • Developed transformation scripts using PySpark, and optimized performance by leveraging Spark SQL and partitioning strategies.
  • Managed source code through GitHub, and implemented CI/CD pipelines using Azure DevOps for deployment.
  • Contributed to RFPs and client proposals by designing scalable cloud data engineering architectures, leveraging Databricks Lakehouse, Delta Live Tables, and Unity Catalog.
  • Collaborated with pre-sales teams to build solution accelerators and cost-optimized architecture diagrams for Azure and AWS data platforms.
  • Designed end-to-end architecture blueprints (ingestion, storage, processing, governance, visualization) for enterprise clients as part of pre-sales engagements.
  • Implemented Lakehouse architecture on Azure Databricks using Delta Lake and Delta Live Tables to support both batch and streaming use cases.
  • Built governance and access control frameworks with Unity Catalog, ensuring compliance and secure data sharing across domains.
  • Designed real-time data pipelines with Delta Live Tables, enabling near real-time insights for business-critical dashboards.

Azure Data Engineer

Tata Consultancy Services
Hyderabad
11.2016 - 02.2023
  • Built and maintained enterprise-grade ETL pipelines using ADF, Databricks, and SQL.
  • Orchestrated workflows using Apache Airflow, improved reliability and observability.
  • Developed and deployed CI/CD pipelines via Azure DevOps for automated integration and delivery.
  • Conducted extensive data profiling, validation, and visualization using Power BI.
  • Engineered and optimized Delta Lake tables and managed schema evolution for data lake storage.
  • Utilized PySpark and SQL to manage complex joins, deduplication, and incremental loads.
  • Led efforts for historical data loads and full refresh scenarios with data archival strategies.
  • Hands-on experience on python and its advance concepts and implemented data driven solutions.
  • Automated processing of millions of rows of data from Azure data factory improves real-time reporting of product metrics.
  • Experience in writing SQL queries and optimizing.
  • Knowledge and work experience in RDBMS concepts, Views, Triggers, Stored Procedures, Indexes, and Constraints.
  • Designed and implemented data-driven BI dashboard and reporting solutions using Power BI and Data storage.

Education

B.E. - Computer Science Engineering

BIST
Bhopal
06.2016

Senior Secondary (12th) - CBSE

Satyam International School
05.2011

Secondary (10th) - CBSE

Loyola High School
06.2009

Skills

  • Python, PySpark, Scala
  • SQL
  • Azure Databricks
  • Apache Spark
  • Azure Data Factory
  • Apache Airflow
  • Amazon EMR
  • ETL/ELT
  • Data lakes, Delta tables
  • Unity Catalog/Data governance
  • Data profiling, data validation
  • Azure DevOps, GitHub
  • Power BI
  • Pandas, NumPy
  • Azure SQL DB, Azure Key Vault

Languages

  • English
  • Hindi

Project Highlights

  • Talend to Azure Migration: Re-engineered legacy Talend ETLs into ADF & Databricks solutions with PySpark logic and Airflow orchestration.
  • Ingestion Framework: Designed and implemented a unified ingestion framework for structured and semi-structured data using Databricks Notebooks, schema inference, and ADF pipelines.
  • Airflow Optimization: Migrated scheduling logic to modular Airflow DAGs with built-in alerting, retry logic, and SLA monitoring.
  • Documentum to Azure ADLS Gen2 Migration: Executed a large-scale data migration from Documentum to Azure ADLS Gen2, implementing an intelligent storage solution utilizing hot, cool, and archive tiers to optimize cost and access patterns for diverse datasets.

Certification

  • Azure Databricks Platform Architect – Expertise in designing, implementing, and securing enterprise-grade Databricks Lakehouse platforms, including governance with Unity Catalog, and scalable solutions with Delta Live Tables.

Timeline

Data Engineer

Capgemini IT Services
03.2024 - Current

Azure Data Engineer

Tata Consultancy Services
11.2016 - 02.2023

B.E. - Computer Science Engineering

BIST

Senior Secondary (12th) - CBSE

Satyam International School

Secondary (10th) - CBSE

Loyola High School
Deepak Mishra