Summary
Overview
Work History
Education
Skills
Timeline
Generic

Siddhant Bhatt

Chennai

Summary

Data Engineer with 2 years of experience specializing in Databricks, PySpark, and large-scale data pipeline development and optimization. Proven expertise in building lineage frameworks, automating incident management, developing LLM-driven applications, and ensuring production stability. Strong at backend data engineering with exposure to business-facing reporting and AI integrations.

Overview

3
3
years of professional experience

Work History

Data Engineer

Comcast
Chennai
05.2023 - Current
  • Key Projects & Contributions
  • Production Migration – Data Validation Framework & RCA Support Developed and managed the data validation framework during production migration.
  • Collaborated with Data Science and engineering teams to conduct RCA for failures and ensured stable migration through structured validation.
  • Automated JIRA Ticketing for Pipeline Failures Built an automated JIRA system for source, silver, and gold job failures, significantly reducing manual intervention and accelerating resolution times.
  • Model Impact Dashboard for Leadership Created a leadership-facing dashboard to monitor downstream model risks due to source failures, enabling proactive issue handling.
  • Column-Level Data Lineage Framework Development Built a scalable column-level lineage system using PySpark with optimized recursive tracing via recursion and dynamic programming.
  • Survey Analyzer Platform for CR&I Team Delivered a survey analysis platform integrating LLMs for topic classification and sentiment analysis, with a RAG chatbot for insights and automated PowerPoint generation.
  • EFS Architecture Enhancements – Resilient Pipeline Execution Improved pipeline resiliency by enabling conditional downstream job execution even when upstream failures occurred, reducing downtime.
  • EFS Job Optimization Reduced critical job runtime gold shap explainer job from 8 hours to 1.5 hours via UDF optimization, native PySpark refactoring, and parallelization.

Data Engineering Intern

OfBusiness
Gurugram
12.2022 - 05.2023
  • Developed a scalable Python-based web crawler for large-scale tender data ingestion with failure handling and file storage in AWS S3.
  • Automated reporting of crawl outcomes to Google Chat via webhooks using Redis queues.
  • Built ETL pipelines using Apache Airflow to sync data from MongoDB and MariaDB into Google BigQuery.
  • Automated ~80% of business reporting using BigQuery and Google Data Studio, replacing manual Excel-based processes.

Education

Bachelor of Technology - Electrical and Electronics Degreee

BITS Hyderabad
Hyderabad, India
01.2023

Master of Science - Physics

BITS Hyderabad
Hyderabad, India
01.2023

Skills

  • Big Data & Processing: Databricks, PySpark, Spark SQL, SQL, pandas, Delta Lake
  • Programming: Python, PySpark, APIs
  • Cloud & Tools: Azure Databricks, AWS S3, Redis, Git, JIRA, Figma, Streamlit
  • Pipeline Orchestration: Databricks Workflows, Apache Airflow, REST APIs
  • Data Governance: Column-level lineage, data validation frameworks, dashboarding
  • Optimization: UDF tuning, parallel processing, broadcast joins
  • LLMs & AI: Retrieval-Augmented Generation (RAG), OpenAI APIs, sentiment analysis
  • Visualization: Google Data Studio, Databricks dashboards

Timeline

Data Engineer

Comcast
05.2023 - Current

Data Engineering Intern

OfBusiness
12.2022 - 05.2023

Bachelor of Technology - Electrical and Electronics Degreee

BITS Hyderabad

Master of Science - Physics

BITS Hyderabad
Siddhant Bhatt