Shivam Kumar

Summary

A results-driven data engineering and analytics leader with 6+ years of experience designing and scaling cloud-native audit analytics solutions. Demonstrates expertise in PySpark, Azure Databricks, SQL, and Tableau to deliver high-quality, compliant, and cost-effective data products that elevate audit quality and enable enterprise-wide, data-driven decisions.

Overview

6

years of professional experience

2

Certifications

3

Languages

Work History

Senior Data Engineer

Deloitte USI

09.2022 - Current

Led the development, optimization, and customization of the analytics product to elevate audit quality and enable enterprise-wide data-driven decisions, leveraging cost-effective models to maximize impact.
Delivered end-to-end, scalable data pipelines and workflows across batch and streaming use cases, improving execution times, resource consumption, and audit analytics turnaround.
Utilized PySpark, Azure Databricks, SQL, and Tableau to extract, transform, load and visualize audit data from SAP, Oracle, and other enterprise systems through resilient data pipelines.
Implemented data governance with Unity Catalog in Azure Databricks to manage fine-grained access, lineage, and compliance across datasets and analytics assets.
Built and maintained production-grade pipelines using Azure Data Factory and Azure Databricks, ensuring reliable ingestion, transformation, and loading to Azure Data Lake storage.
Developed Databricks PySpark jobs to systematically clean and standardize data, including null checks, value normalization, and schema validation.
Implemented an automated data quality validation framework, executing 30+ rule-based checks with PySpark and in-platform DIC rulebooks to ensure accuracy and regulatory compliance.
Implemented Delta Lakehouse architecture on Parquet to enable ACID transactions, enforce schema integrity, and deliver modern warehouse capabilities for reliable analytics and governance.
Conducted performance tuning and troubleshooting of Databricks jobs, improving efficiency, scalability, and cost-effectiveness.
Applied Spark optimization techniques such as caching, map, reduceByKey, and repartitioning, to accelerate data processing and reduce compute overhead. Addressed data skew with salting technique and minimized shuffle costs using broadcast joins.
Built and deployed an Azure OpenAI–powered RAG GENAI chatbot that streamlines review of 10,000+ audit documents while achieving 95% accuracy in anomaly detection and risk assessment.
Hands on experience in building interactive, user-friendly dashboards in Tableau and PowerBI that drive data-driven decisions, translating complex data into actionable insights.
Implemented CI/CD pipelines in Azure Databricks using Azure DevOps, integrating Git for version control, automated testing, and continuous deployment.
Worked in Agile Methodologies.
Mentored junior Spark developers and data engineers, fostering best practices in coding standards, performance optimization, and cloud data engineering.
An Excellent Team Player with a good problem-solving approach strong communication, leadership skills, and ability to work in a time-constrained and team-oriented environment and independently with minimal supervision to meet deadlines.

Data Engineer

Tata consultancy service

08.2019 - 09.2022

Built scalable data pipelines on AWS using Spark on EMR, moving and transforming data from multiple RDBMS sources into Amazon S3.
Cut processing times by up to 60% through Spark partitioning, Hive bucketing, and indexing, improving EMR performance and cost efficiency.
Created optimized internal/external Hive tables on S3/HDFS with partitioned schemas to enable fast analytical queries and downstream BI.
Standardized data formats (Parquet/ORC), implemented schema evolution, and enforced secure access with IAM roles and bucket policies.
Improved operations with EMR/S3 monitoring and tuning, documenting best practices to sustain reliability and performance at scale.

Education

Bachelor of Technology - EE

RCCIIT

Kolkata

06-2019

Intermediate -

BRNKS Intercollege

03-2014

Matriculation -

PC High School

Patsa, Samastipur

03-2012

Skills

Azure Databricks

Certification

Azure Data Engineer Associate (DP 203)

Genai Project

Implemented Retrieval-Augmented Generation (RAG) System with Azure OpenAI Service., Developed and deployed a comprehensive RAG system using Python and Azure cloud services.

Awards

Spot award for contribution in Standard analytics
Spot award for special initiative and optimization in DIC.
Outstanding performance awards for year 2024-2025 by Deloitte A&A practice.

Accomplishments

Participated in the Google Cloud Hack2Skill Hackathon; selected among the top 100 teams.
Contributed to multiple community impact initiatives, including Deloitte Impact Day activities—planting trees and supporting education for underprivileged children.
Captained Deloitte’s Audit cricket team, leading the squad across multiple tournaments and fostering teamwork, discipline, and performance.

Timeline

Senior Data Engineer

Deloitte USI

09.2022 - Current

Data Engineer

Tata consultancy service

08.2019 - 09.2022

Bachelor of Technology - EE

RCCIIT

Intermediate -

BRNKS Intercollege

Matriculation -

PC High School

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Education

Bachelor of Technology - EE

Intermediate -

Matriculation -

Skills

Certification

Genai Project

Awards

Accomplishments

Timeline

Senior Data Engineer

Data Engineer

Bachelor of Technology - EE

Intermediate -

Matriculation -

Similar Profiles

HEMA SAINIHEMA SAINI

Ram KollaRam Kolla

Anuj PathakAnuj Pathak

Abhishek SapreAbhishek Sapre

Vijender Reddy MuchaVijender Reddy Mucha