Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Gaurav Singh

New Delhi

Summary

Data Engineer with over 8 years of experience, recognized for expertise in cloud migration and data pipeline design at KPMG India. Proficient in Azure Synapse and PySpark, achieving a 30% increase in processing efficiency through effective team collaboration. Committed to enhancing data governance and delivering solutions that significantly improve business performance.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Data Engin

KPMG India
Gurgaon
12.2022 - Current
  • Spearheaded development of harmonized data platform for global finance client, integrating data from Germany, Switzerland, and Italy for senior management reporting.
  • Designed scalable data pipelines using Azure Synapse, Delta Tables, PySpark, and Gen2 Storage to manage terabytes of data daily.
  • Led migration from AWS to Azure, re-architecting pipelines and adapting data models to align with Azure-native services.
  • Built generic manual file ingestion pipeline to facilitate seamless onboarding and reduce manual intervention across teams.
  • Adopted Data Vault 2.0 modeling approach to address schema challenges and developed custom macro scripts in Erwin for automated DDL generation.
  • Collaborated with cross-functional teams to align data transformations with business logic, ensuring accurate reporting layers.
  • Migrated automobile data from IBM Cloud to AWS S3, enhancing governance and compliance processes.

Utilized Databricks and PySpark for IoT data processing, achieving 30% improvement in processing efficiency.

Sof

H1Lifesciences India Ltd
Hyderabad
10.2021 - 12.2022
  • Developed CNSP tools enhancing data quality across teams by standardizing names, addresses, and organizations using SpaCy NLP models.
  • Designed REST APIs for model outputs, facilitating smooth integration of CNSP tools into business workflows.
  • Created a Python data validation script to identify incorrect production data, saving analysts hours in review processes.
  • Migrated organizational data processes to a Big Data environment, centralizing diverse datasets in H1 Datalake.
  • Transformed legacy stored procedures into PySpark code to improve computational efficiency and scalability.
  • Implemented validations and utilized CNSP tools during data transitions, ensuring integrity across bronze-gold layers.
  • Contributed to Docker-based development environment setup, supporting containerization and deployment workflows.

Software Engineer

O9 Solutions Management India (P) Ltd
Banglore
05.2021 - 10.2021
  • Designed and implemented an automated ETL pipeline to collect and clean product data from various online sources across the U.S. retail market.
  • Built a clustering model using K-Means++ to analyze collected data and optimize the scope of web crawling, significantly reducing redundant data collection.
  • Reduced the number of target stores from 700+ to 150 by identifying representative clusters, improving efficiency and reducing system load.
  • Leveraged Python, Selenium, and Azure Synapse to streamline data ingestion and processing for market intelligence insights.

Data Automation Engineer

Shore InfoTech India Private Limited
Hyderabad
02.2018 - 05.2021
  • Designed and implemented an end-to-end automated data pipeline to extract financial documents from 50+ client web portals and upload them to the in-house GP Workflow tool, replacing manual processes.
  • Developed 50+ Python-based web scrapers using techniques like API response parsing, Selenium automation, and RPA solutions to download documents based on metadata extracted from email bodies.
  • Built metadata extraction scripts using text mining techniques and REST API integrations to retrieve and process email content from the GP Workflow tool.
  • Engineered infrastructure for automation, including notification systems, monitoring dashboards, outage handling, and secure credential management via REST APIs.
  • Performed load balancing and optimization to ensure uninterrupted execution of Python and RPA Kapow-based workflows.
  • Developed a supervised machine learning model (94% accuracy) using Decision Trees to classify financial documents (e.g., distribution notices, capital calls, financial statements), significantly reducing manual QA effort.
  • Applied OpenCV and Tesseract-OCR for image preprocessing and text extraction, enhancing model accuracy and document classification reliability.
  • Solved complex table extraction challenges from diverse document formats (PDF, Word, Excel, Images) using PDFMiner and XML tree parsing, improving data accuracy and reducing processing complexity.
  • Built and deployed a custom pipeline for extracting and validating data from Schedule of Investment (SOI) documents and cash flow reports, including annotation workflows using DataTurcks for 100% accuracy.
  • Standardized and normalized extracted data for QA validation and integration into downstream applications via REST APIs.
  • Created real-time dashboards and reports using Power BI and SQL Server to visualize process performance, backlog status, error trends, and fund health metrics across all automation workflows

Analyst

HCL Technologies
Noida
11.2016 - 05.2017
  • Identified opportunities for process improvements across the organization.
  • Identified needs of customers promptly and efficiently.

Education

Bachelor in Technology - Electronics And Communication Engineering

Manav Rachna International University
Faridabad,India
05-2016

Skills

  • Data pipeline design
  • Cloud migration
  • Data modeling and validation
  • Data governance
  • Data extraction
  • Data analysis
  • Azure Synapse
  • Databricks
  • PySpark programming
  • Big data processing
  • Python
  • CI/CD

Certification

  • Microsoft Certified: Azure Data Engineer Associate
  • POST GRADUATE PROGRAM IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Languages

Hindi
First Language
English
Advanced (C1)
C1

Timeline

Data Engin

KPMG India
12.2022 - Current

Sof

H1Lifesciences India Ltd
10.2021 - 12.2022

Software Engineer

O9 Solutions Management India (P) Ltd
05.2021 - 10.2021

Data Automation Engineer

Shore InfoTech India Private Limited
02.2018 - 05.2021

Analyst

HCL Technologies
11.2016 - 05.2017

Bachelor in Technology - Electronics And Communication Engineering

Manav Rachna International University
Gaurav Singh