Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Timeline
Generic
Sahil Gupta

Sahil Gupta

Noida

Summary

Azure Data Engineer with 3.5 years of hands-on experience in data engineering and 1 year of performance testing experience, specializing in building scalable, high-performance data pipelines for Healthcare and Life Sciences clients. Proven expertise in Azure Databricks ADLS Gen2, ADF, PySpark, SQL, and Python. Experienced in ingesting and optimizing API- and Salesforce-based healthcare data (CEC, GPS, OneCM) using Delta Lake and Spark performance tuning techniques. Strong background in performance engineering, enabling efficient pipeline design, reduced latency, improved data quality, and reliable analytics delivery for business and compliance teams.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

Cognizant Technology Solutions
09.2022 - Current

Project 1: Salesforce Healthcare Data Ingestion & Analytics Platform

Source: Salesforce APIs (CEC, GPS, OneCM), REST APIs

Sink: ADLS Gen2, Delta Lake, Azure Databricks

Consumers: Healthcare Operations, Reporting & Compliance Teams.

  • Designed scalable API-driven ingestion pipelines using Lakeflow Connect and ADF, landing raw Salesforce healthcare data into ADLS Gen2 (Bronze layer).
  • Implemented Lakeflow Declarative Pipelines with Bronze–Silver–Gold architecture, enabling modular transformations and improving pipeline maintainability.
  • Applied Delta Lake performance optimizations such as partitioning on business keys, Z-Ordering, and OPTIMIZE, improving query performance by ~40%.
  • Developed PySpark transformations with predicate pushdown, column pruning, and broadcast joins, reducing data processing time by 30–35%.
  • Enabled incremental ingestion using watermarking and CDC logic, minimizing reprocessing and reducing compute cost by ~25%.
  • Tuned Spark workloads using adaptive query execution (AQE), optimized shuffle partitions, and autoscaling clusters, resulting in more stable and cost-efficient jobs.
  • Delivered curated Delta tables consumed by BI tools and downstream analytics teams for healthcare insights and regulatory reporting.


Project 2: Healthcare API Performance & Data Reliability Engineering Platform

Source: Salesforce APIs, Internal Healthcare APIs

Sink: ADLS Gen2, Delta Tables, Databricks SQL

Consumers: Data Analysts, Product Owners, Performance Teams

  • Leveraged API performance testing expertise to design high-throughput ingestion pipelines, handling millions of healthcare records per day.
  • Optimized API ingestion by parallelizing API calls, controlling batch sizes, and implementing retry & backoff mechanisms, reducing ingestion latency by ~35%.
  • Built PySpark-based data validation frameworks with checksum and record count reconciliation, reducing data discrepancies by ~25%.
  • Applied Spark memory tuning (executor sizing, caching frequently used datasets) to improve job reliability and reduce failures.
  • Implemented file compaction strategies and small-file handling using Delta Lake, improving read performance and lowering storage overhead.
  • Developed SQL-based monitoring queries to track data freshness, SLA breaches, and pipeline failures, improving operational visibility.
  • Enabled analytics-ready datasets for reporting and operational teams, supporting faster healthcare application insights.

Performance Test Engineer

Cognizant Technology Solutions
08.2021 - 09.2022

Performance Testing Engineer with expertise in end-to-end performance testing of Salesforce-based applications. Skilled in defining NFRs, developing automated UI/API performance scripts, and analyzing system scalability using tools like LoadRunner, JMeter, Splunk, and Salesforce Scale Center.

Education

Bachelor of Technology - Computer Science Engineering

Jaypee Institute of Information Technology
Noida, India
07-2021

Skills

  • Cloud & Data Engineering: Microsoft Azure, Azure Databricks, Azure Data Factory (ADF), Azure Data Lake Storage Gen2 (ADLS Gen2)
  • Databricks & Lakehouse: Lakeflow Connect, Lakeflow Jobs, Lakeflow Declarative Pipelines, Delta Lake
  • Programming & Querying: PySpark, Python, SQL
  • Domain & Source Systems: Salesforce (CEC, GPS, OneCM), Healthcare & Life Sciences Data, HIPAA-aware Data Handling

Certification

  • DP-203 Azure Data Engineer
  • Oracle Data Platform 2025 Certified Foundation Associate

Accomplishments

  • Certificate of Appreciation: Recognition for a record of outstanding accomplishment within account

Timeline

Azure Data Engineer

Cognizant Technology Solutions
09.2022 - Current

Performance Test Engineer

Cognizant Technology Solutions
08.2021 - 09.2022

Bachelor of Technology - Computer Science Engineering

Jaypee Institute of Information Technology
Sahil Gupta