Summary
Overview
Work History
Education
Skills
Certification
Genai Project
Awards
Accomplishments
Timeline
Generic

Shivam Kumar

Senior Data Engineer
Hyderabad

Summary

A results-driven data engineering and analytics leader with 6+ years of experience designing and scaling cloud-native audit analytics solutions. Demonstrates expertise in PySpark, Azure Databricks, SQL, and Tableau to deliver high-quality, compliant, and cost-effective data products that elevate audit quality and enable enterprise-wide, data-driven decisions.

Overview

6
6
years of professional experience
2
2
Certifications
3
3
Languages

Work History

Senior Data Engineer

Deloitte USI
09.2022 - Current
  • Led the development, optimization, and customization of the analytics product to elevate audit quality and enable enterprise-wide data-driven decisions, leveraging cost-effective models to maximize impact.
  • Delivered end-to-end, scalable data pipelines and workflows across batch and streaming use cases, improving execution times, resource consumption, and audit analytics turnaround.
  • Utilized PySpark, Azure Databricks, SQL, and Tableau to extract, transform, load and visualize audit data from SAP, Oracle, and other enterprise systems through resilient data pipelines.
  • Implemented data governance with Unity Catalog in Azure Databricks to manage fine-grained access, lineage, and compliance across datasets and analytics assets.
  • Built and maintained production-grade pipelines using Azure Data Factory and Azure Databricks, ensuring reliable ingestion, transformation, and loading to Azure Data Lake storage.
  • Developed Databricks PySpark jobs to systematically clean and standardize data, including null checks, value normalization, and schema validation.
  • Implemented an automated data quality validation framework, executing 30+ rule-based checks with PySpark and in-platform DIC rulebooks to ensure accuracy and regulatory compliance.
  • Implemented Delta Lakehouse architecture on Parquet to enable ACID transactions, enforce schema integrity, and deliver modern warehouse capabilities for reliable analytics and governance.
  • Conducted performance tuning and troubleshooting of Databricks jobs, improving efficiency, scalability, and cost-effectiveness.
  • Applied Spark optimization techniques such as caching, map, reduceByKey, and repartitioning, to accelerate data processing and reduce compute overhead. Addressed data skew with salting technique and minimized shuffle costs using broadcast joins.
  • Built and deployed an Azure OpenAI–powered RAG GENAI chatbot that streamlines review of 10,000+ audit documents while achieving 95% accuracy in anomaly detection and risk assessment.
  • Hands on experience in building interactive, user-friendly dashboards in Tableau and PowerBI that drive data-driven decisions, translating complex data into actionable insights.
  • Implemented CI/CD pipelines in Azure Databricks using Azure DevOps, integrating Git for version control, automated testing, and continuous deployment.
  • Worked in Agile Methodologies.
  • Mentored junior Spark developers and data engineers, fostering best practices in coding standards, performance optimization, and cloud data engineering.
  • An Excellent Team Player with a good problem-solving approach strong communication, leadership skills, and ability to work in a time-constrained and team-oriented environment and independently with minimal supervision to meet deadlines.

Data Engineer

Tata consultancy service
08.2019 - 09.2022
  • Built scalable data pipelines on AWS using Spark on EMR, moving and transforming data from multiple RDBMS sources into Amazon S3.
  • Cut processing times by up to 60% through Spark partitioning, Hive bucketing, and indexing, improving EMR performance and cost efficiency.
  • Created optimized internal/external Hive tables on S3/HDFS with partitioned schemas to enable fast analytical queries and downstream BI.
  • Standardized data formats (Parquet/ORC), implemented schema evolution, and enforced secure access with IAM roles and bucket policies.
  • Improved operations with EMR/S3 monitoring and tuning, documenting best practices to sustain reliability and performance at scale.

Education

Bachelor of Technology - EE

RCCIIT
Kolkata
06-2019

Intermediate -

BRNKS Intercollege
03-2014

Matriculation -

PC High School
Patsa, Samastipur
03-2012

Skills

    Azure Databricks

undefined

Certification

Azure Data Engineer Associate (DP 203)

Genai Project

Implemented Retrieval-Augmented Generation (RAG) System with Azure OpenAI Service., Developed and deployed a comprehensive RAG system using Python and Azure cloud services.

Awards

  • Spot award for contribution in Standard analytics
  • Spot award for special initiative and optimization in DIC.
  • Outstanding performance awards for year 2024-2025 by Deloitte A&A practice.

Accomplishments

  • Participated in the Google Cloud Hack2Skill Hackathon; selected among the top 100 teams.
  • Contributed to multiple community impact initiatives, including Deloitte Impact Day activities—planting trees and supporting education for underprivileged children.
  • Captained Deloitte’s Audit cricket team, leading the squad across multiple tournaments and fostering teamwork, discipline, and performance.

Timeline

Senior Data Engineer

Deloitte USI
09.2022 - Current

Data Engineer

Tata consultancy service
08.2019 - 09.2022

Bachelor of Technology - EE

RCCIIT

Intermediate -

BRNKS Intercollege

Matriculation -

PC High School
Shivam KumarSenior Data Engineer