Summary

Overview

Work History

Education

Skills

Languages

Project Highlights

Certification

Timeline

Deepak Mishra

Noida

Summary

Experienced and results-driven Data Engineer with over 8 years of expertise in architecting and implementing robust data pipelines and scalable solutions on Azure and AWS platforms. Proven hands-on proficiency in Azure Databricks, ADF, Apache Airflow, PySpark, Scala, and SQL. Skilled in CI/CD practices using Azure DevOps and GitHub Actions, with a strong focus on ETL/ELT processing, data quality, and performance optimization. Adept in cross-functional collaboration, delivering modern data solutions that support insightful business decision-making. Beyond delivery, actively engaged in RFP responses, pre-sales solutioning, and client proposal support—partnering with sales and architecture teams to design secure, scalable, and cost-efficient data platforms.

Overview

years of professional experience

Certification

Work History

Data Engineer

Capgemini IT Services

Noida

03.2024 - Current

Designed and implemented a reusable data ingestion framework using Azure Databricks, ADF, and Apache Airflow.
Migrated on-premises Talend ETL processes to modern cloud-based pipelines.
Developed scalable PySpark modules for batch and streaming data ingestion.
Collaborated with cross-functional teams to deliver efficient, production-grade data workflows.
Built data pipelines to transform and load data into Delta Lake formats on Azure Data Lake Storage Gen2.
Tuned Apache Spark jobs for performance by optimizing partitioning, caching, and broadcast joins.
Developed modular and reusable code in Scala and PySpark for data transformation.
Implemented SQL-based logic to support data quality rules, aggregations, and business logic.
Led the migration of legacy ETL workflows from Talend to AWS EMR using Apache Spark and PySpark, improving scalability and reducing processing time by over 40%.
Designed a reusable ETL framework to dynamically orchestrate jobs using Apache Airflow and EMR step functions.
Implemented robust data validation and logging mechanisms, ensuring traceability across S3-based ingestion pipelines.
Developed transformation scripts using PySpark, and optimized performance by leveraging Spark SQL and partitioning strategies.
Managed source code through GitHub, and implemented CI/CD pipelines using Azure DevOps for deployment.
Contributed to RFPs and client proposals by designing scalable cloud data engineering architectures, leveraging Databricks Lakehouse, Delta Live Tables, and Unity Catalog.
Collaborated with pre-sales teams to build solution accelerators and cost-optimized architecture diagrams for Azure and AWS data platforms.
Designed end-to-end architecture blueprints (ingestion, storage, processing, governance, visualization) for enterprise clients as part of pre-sales engagements.
Implemented Lakehouse architecture on Azure Databricks using Delta Lake and Delta Live Tables to support both batch and streaming use cases.
Built governance and access control frameworks with Unity Catalog, ensuring compliance and secure data sharing across domains.
Designed real-time data pipelines with Delta Live Tables, enabling near real-time insights for business-critical dashboards.

Azure Data Engineer

Tata Consultancy Services

Hyderabad

11.2016 - 02.2023

Built and maintained enterprise-grade ETL pipelines using ADF, Databricks, and SQL.
Orchestrated workflows using Apache Airflow, improved reliability and observability.
Developed and deployed CI/CD pipelines via Azure DevOps for automated integration and delivery.
Conducted extensive data profiling, validation, and visualization using Power BI.
Engineered and optimized Delta Lake tables and managed schema evolution for data lake storage.
Utilized PySpark and SQL to manage complex joins, deduplication, and incremental loads.
Led efforts for historical data loads and full refresh scenarios with data archival strategies.
Hands-on experience on python and its advance concepts and implemented data driven solutions.
Automated processing of millions of rows of data from Azure data factory improves real-time reporting of product metrics.
Experience in writing SQL queries and optimizing.
Knowledge and work experience in RDBMS concepts, Views, Triggers, Stored Procedures, Indexes, and Constraints.
Designed and implemented data-driven BI dashboard and reporting solutions using Power BI and Data storage.

Education

B.E. - Computer Science Engineering

BIST

Bhopal

06.2016

Senior Secondary (12th) - CBSE

Satyam International School

05.2011

Secondary (10th) - CBSE

Loyola High School

06.2009

Skills

Python, PySpark, Scala
SQL
Azure Databricks
Apache Spark
Azure Data Factory
Apache Airflow
Amazon EMR
ETL/ELT

Data lakes, Delta tables
Unity Catalog/Data governance
Data profiling, data validation
Azure DevOps, GitHub
Power BI
Pandas, NumPy
Azure SQL DB, Azure Key Vault

Languages

English
Hindi

Project Highlights

Talend to Azure Migration: Re-engineered legacy Talend ETLs into ADF & Databricks solutions with PySpark logic and Airflow orchestration.
Ingestion Framework: Designed and implemented a unified ingestion framework for structured and semi-structured data using Databricks Notebooks, schema inference, and ADF pipelines.
Airflow Optimization: Migrated scheduling logic to modular Airflow DAGs with built-in alerting, retry logic, and SLA monitoring.
Documentum to Azure ADLS Gen2 Migration: Executed a large-scale data migration from Documentum to Azure ADLS Gen2, implementing an intelligent storage solution utilizing hot, cool, and archive tiers to optimize cost and access patterns for diverse datasets.

Certification

Azure Databricks Platform Architect – Expertise in designing, implementing, and securing enterprise-grade Databricks Lakehouse platforms, including governance with Unity Catalog, and scalable solutions with Delta Live Tables.

Timeline

Data Engineer

Capgemini IT Services

03.2024 - Current

Azure Data Engineer

Tata Consultancy Services

11.2016 - 02.2023

B.E. - Computer Science Engineering

BIST

Senior Secondary (12th) - CBSE

Satyam International School

Secondary (10th) - CBSE

Loyola High School

Deepak Mishra

Summary

Overview

Work History

Data Engineer

Azure Data Engineer

Education

B.E. - Computer Science Engineering

Senior Secondary (12th) - CBSE

Secondary (10th) - CBSE

Skills

Languages

Project Highlights

Certification

Timeline

Data Engineer

Azure Data Engineer

B.E. - Computer Science Engineering

Senior Secondary (12th) - CBSE

Secondary (10th) - CBSE

Similar Profiles

Simran ShaikhSimran Shaikh

Vikas RajakVikas Rajak

Surjith Shanmuga Rajan PSurjith Shanmuga Rajan P

Shraddha SharmaShraddha Sharma

Nishant JaiswalNishant Jaiswal