Divyanshu Gupta

Gurugram

Summary

Data Engineer with 7 years of industry experience, including 3 years in Core Data Engineering, specializing in designing, building, and optimizing scalable data solutions on Azure cloud platforms. Experienced in developing high-performance batch and real-time data pipelines using PySpark, SQL, Azure Data Factory, and Azure Databricks to improve data availability and enable enterprise-scale analytics in the investment banking domain. Skilled in implementing Medallion Architecture and data governance practices to ensure secure, reliable, and high-performance data processing.

Overview

years of professional experience

Certification

Work History

Associate

NatWest Group

Gurgaon

09.2022 - Current

Designed and maintained scalable batch and real-time pipelines using Azure Databricks (PySpark) and Structured Streaming, improving end-to-end data availability.
Consumed high-volume streaming data from Apache Kafka and processed it using Medallion Architecture (Bronze, Silver, Gold) with Delta Lake, enabling reliable and structured downstream analytics.
Implemented Auto Loader to efficiently ingest high-volume real-time files from ADLS Gen2, enabling incremental processing and reducing ingestion latency by 35%.
Implemented incremental processing using Delta Lake MERGE, CDC patterns, and Time Travel, reducing data reconciliation effort by 40%.
Optimized Spark workloads using Liquid Clustering, Z-Ordering, partitioning strategies, broadcast joins, and file compaction, improving query performance by 30-40% and reducing compute costs by 25%.
Tuned complex Spark SQL queries and cluster configurations, cutting pipeline execution time by 30% and improving resource utilization.
Enforced data governance using Unity Catalog, implementing Row-Level Security (RLS) and Column-Level Security (CLS), ensuring secure and compliant data access across teams.
Built automated data quality validation and monitoring frameworks using Spark Declarative Pipelines, reducing production data defects by 35% and significantly improving pipeline reliability.
Monitored and troubleshot production jobs using Databricks Workflows and Spark UI, minimizing job failures and reducing incident resolution time by 30%.

System Engineer

Tata Consultancy Services Limited

Gurgaon

10.2016 - 09.2020

Handled clients independently to define and document test objectives, strategies, and plans for complex financial applications, including trading platforms, risk management systems, and reporting tools.
Developed and executed test cases, reported issues, and supported issue resolution by collaborating with development teams.
Coordinated with cross-functional teams to ensure timely delivery of projects while maintaining focus on quality and standards.
Identified, tracked, and managed software defects, ensuring that issues were resolved in a timely manner.
Analyzed software applications for bugs, inconsistencies, and performance issues, ensuring high-quality deliverables and improving overall system performance.
Assisted in the design, development, and execution of test cases for Investment banking products.
Collaborated with developers and business analysts to ensure the correct implementation of functionalities and requirements.
Provided detailed reports on testing progress, issues, and results, contributing to team-based decision-making.
Supported the identification of defects, tracked their progress, and validated resolutions.

Education

B.Tech. - Environmental Engineering

A.P.J Abdul Kalam Technical University

Lucknow

Skills

Cloud & Platforms:
Azure Databricks, ADLS Gen2

Programming & Query:
Python (PySpark), SQL, Spark SQL

Big Data & Streaming:
Apache Spark, Structured Streaming, Apache Kafka

Data Engineering:
ETL/ELT Pipelines, Medallion Architecture, Delta Lake, CDC, Data Modeling