Overview
Work History
Education
Skills
Projects
Websites
Awards
Timeline
Generic

Siddharth S

Overview

10
10
years of professional experience

Work History

Senior Data Engineer

Happiest Minds(DisneyStreaming)
Noida
04.2025 - Current
  • Maintained the existing SQL pipelines processing daily data loads from Databricks to Snowflake.
  • Helping the Analytics team in the data analysis by creating adhoc analysis table as per the requirement
  • Python, Pyspark, Snowflake, Databricks, Airflow

Senior Data Engineer

Annalect
Gurugram
01.2023 - 04.2025
  • Developed and contributed Python-based data connectors to Airbyte’s open-source framework, enabling automated extraction of 50GB+ daily data from multiple social media platforms to S3 with 99.9% reliability.
  • Implemented stateful data ingestion using DynamoDB, reducing EC2 costs by 30% while processing multiple asynchronous API requests/hour.
  • Created dockerized development environment for AWS Glue and EMR services, cutting local testing costs by 60% and reducing deployment time from 3-4 hours to 1 hour approx.
  • Architected data lakehouse using Apache Iceberg, achieving 40% better query performance
  • Built automated data quality framework using Great Expectations, detecting 70% of anomalies pre-ingestion and reducing data incidents by 40%.
  • Python, Pyspark, AWS Glue, Lambda, S3, DynamoDB, EC2, SQS, SNS, AWS CDK, AWS Athena, Apache Iceberg, Airbyte, Airflow, Great expectations, EMR serverless,Docker

Lead Data Scientist

Dunnhumby
Gurugram
12.2021 - 01.2023
  • Architected scalable ETL pipelines processing 15GB daily data using AWS Glue,while maintaining performance, reducing monthly costs by 45%.
  • Built fault-tolerant pipelines with S3 and Athena, handling 200M+ records daily with 99.8% uptime.
  • Optimized data transformations and warehouse queries, reducing processing time from 6 to 2 hours with 60% better performance.
  • Led small team of 4 members implementing end-to-end data solutions, achieving 75% reduction in compute costs through efficient resource utilization.
  • Orchestrated workflows using Apache Airflow, ensuring 24/7 reliability with optimized job scheduling reducing concurrent usage by 50%.
  • Python, Pyspark, Lambda, S3, DynamoDB, EC2, SQS, SNS, AWS CDK, AWS Athena , AWS Glue, Airflow, Cron jobs

SDET2

F5 Networks
Hyderabad
07.2020 - 12.2021
  • Built automated data pipelines using Azure Data Factory, ingesting 1TB+ daily data from multiple databases to Snowflake achieving 99.9% reliability.
  • Led POC migrating ETL transformations to AWS EMR with PySpark, achieving 60% faster processing on 1TB+ data and 40% cost reduction as initial phase of big data migration.
  • Python,Snowflake,Azure Data Factory, Oracle, My sql, Informatica,Pyspark

Software Engineer

Accenture Services Ltd
Gurugram
10.2015 - 05.2020
  • Developed Hive queries and HBase integration for analytical data processing.
  • Engineered Spark-Hive integration pipelines, reducing data retrieval latency.
  • Python,Hive,Hbase, Unix

Education

B.E. - Civil Engineering

KIIT University Bhubneshwar
04.2015

Skills

  • Languages: Python, Scala, Pyspark, SQL
  • Technologies & Tools: AWS, EMR , DynamoDB, S3, SQS, Lambda, Athena, Glue, Spark, Hive, Hbase, Apache Iceberg, Airbyte, Great Expectations,Snowflake Databricks,Unity Catalog,Azure cloud,blob containers,Azure Data Factory,Azure Synapse Analytics,Apache Nifi, Docker,Unix,Datorama,Big Data, ETL, Informatica,Datalake,Lakehouse, Delta tables

Projects

Stackoverflow tag predictor (2020): Developed multi-label classification model to predict Stack Overflow tags from question titles and content using NLP techniques and scikit-learn., Lakehouse architecture with Delta tables (2023): Developed a poc for a project to implement lakehouse architecture using delta tables in aws azure leveraging databricks.

Awards

“Rise and Shine Award” for contributing social media data connectors to Airbyte open-source project, enabling enterprise-wide data ingestion, “Coach Award” for designing dockerized local development framework, accelerating AWS Glue and EMR serverless job testing across teams

Timeline

Senior Data Engineer

Happiest Minds(DisneyStreaming)
04.2025 - Current

Senior Data Engineer

Annalect
01.2023 - 04.2025

Lead Data Scientist

Dunnhumby
12.2021 - 01.2023

SDET2

F5 Networks
07.2020 - 12.2021

Software Engineer

Accenture Services Ltd
10.2015 - 05.2020

B.E. - Civil Engineering

KIIT University Bhubneshwar
Siddharth S