Summary
Overview
Work History
Education
Skills
Extra-Curricular Activities
Personal Information
Disclaimer
Accomplishments
Software
Interests
Certification
Singing
Timeline
Generic

Sandeep Chauhan

Software Engineer
Alandi

Summary

Experienced software engineer with a proven track record in designing and constructing robust big data pipelines. Proficient in Apache Flink and Spark, with the ability to write efficient code in Java and Python. Always eager to learn new languages as per job requirements. Strong foundation in data structures and algorithms for optimizing system performance and delivering high-quality solutions.

Overview

5
5
years of professional experience
9
9
years of post-secondary education
2
2
Certifications
3
3
Languages

Work History

Data Engineer

Velotio technologies
12.2021 - Current

1. Client: 2. Client: 3. Client: 4. Client: 5. Client:

Project: Services, Interaction, and Subscriptions Data Models
Role: Big Data Engineer (2024 - Present)

About Chamberlain INC:
Chamberlain is a B2B company providing IoT-based services to clients across diverse industries such as deliveries (Amazon, Walmart, etc.), automotive (Tesla, Hyundai, Honda, etc.), and smart home solutions. The IoT devices collect vast amounts of data, which are analyzed to generate actionable business insights.

Roles and Responsibilities:

  • Data Modeling:Implement data models for the Bronze, Silver, Gold, and Platinum layers based on data architect requirements.
    Develop metrics derived from existing data to support business intelligence needs.
  • DBT Query Development:Write and maintain DBT (Data Build Tool) queries to extract required data from various sources.
    Develop and test DBT models to ensure robust implementation of data models.
  • ETL Orchestration:Build and manage Azure Data Factory (ADF) pipelines for orchestrating ETL workflows efficiently.
  • Data Transformation and Analysis:Create Databricks scripts using PySpark for analyzing and transforming data stored in the Bronze layer.

Technologies Used:
Databricks, Data Build Tool (DBT), Azure Data Factory, Azure DevOps, PySpark

Project: Providence
Role: Flink Developer (2023 - Present)

Project Overview:
The Providence project focuses on collecting and analyzing data from Warner Bros. Discovery’s applications used by clients. The data captures user interactions with the apps, along with any errors encountered during normal operations. This data helps improve app performance and enhances the user experience.

Roles and Responsibilities:

  • Streaming Analytics Pipeline Development:Built a streaming analytics pipeline from scratch to monitor the real-time performance of Warner Bros. Discovery client applications.
    Tracked critical real-time metrics such as the health of active sessions, error-free session rates, crash-free session rates, and error types.
    Analyzed the collected data to address issues in new app releases, ensuring an optimal client experience.
  • Event Filtering Pipeline:Developed a streaming pipeline to filter malformed events in real-time, aiding developers in improving client logger SDK event generation.
  • Alerting and Monitoring:Configured alerts using Prometheus and PagerDuty for monitoring and notifying about the Flink application’s health and performance.
  • Automation:Automated Flink application deployments using Terraform and GitHub Actions for streamlined and efficient deployment processes.
  • Load Testing Microservice:Designed and implemented a microservice for load testing the Flink application, leveraging Kubernetes jobs to simulate millions of events per minute in a production-like environment.
  • Data Partitioning Automation:Developed stored procedures to automate the attachment and detachment of PostgreSQL partitions, enabling seamless data migration between hot, warm, and cold storage tiers for Providence’s data.
  • Pipeline Optimization:Provided ongoing support and implemented enhancements to improve the robustness and efficiency of the Flink pipeline.

Technologies Used:
Apache Flink, Kubernetes, Terraform, GitHub Actions, JFrog, Maven, AWS Managed Flink, AWS Timestream, AWS RDS, Grafana (for visualization), Prometheus (for monitoring), Kafka (for streaming).

Programming Language:
Java

Deployment Details:
The Flink application is deployed using AWS's managed service, AWS Managed Flink, ensuring scalability and reliability.

Project: OpenSearch Analytics
Role: Data Engineer (2022 - 2023)

Project Overview:
This project focused on identifying and addressing suboptimal OpenSearch query practices that impacted performance and response times. Additionally, it analyzed heavily used indexes to optimize scaling of OpenSearch clusters based on usage patterns.

Roles and Responsibilities:

  • Designed and developed a data pipeline to analyze OpenSearch’s slow index and search query logs.
  • Visualized analysis results using Grafana for better insights into query performance issues.
  • Utilized AWS Lambda for data extraction, transformation, and storage in Athena.
  • Automated metadata updates with AWS Glue Crawler.
  • Configured S3 object creation notifications with AWS SQS to trigger Lambda functions.
  • Deployed the pipeline using AWS CloudFormation and Jenkins for reliable and repeatable deployments.

Technologies Used:
AWS Lambda, AWS Athena, AWS SQS, AWS S3, AWS Glue Crawler, CloudFormation, Jenkins, Grafana

Programming Language:
Python

Project: Observability
Role: Data Engineer (2022 - 2023)

Project Overview:
This project aimed to enhance monitoring, incident reporting, and metering of Discovery’s infrastructure and services. By tagging resources (e.g., code repositories, AWS services, alerts) with operational metadata, the project facilitated quick resolution of failures and accurate cost metering for individual teams.

Roles and Responsibilities:

  • Built data pipelines to collect and maintain operational metadata for infrastructure resources on a daily basis in a PostgreSQL database.
  • Tagged resources to enable faster failure resolution and provide cost visibility per team.
  • Leveraged PySpark for data processing and GitHub Actions for CI/CD automation.
  • Integrated Grafana dashboards for visualizing metadata and operational metrics.

Technologies Used:
PySpark, GitHub Actions, PostgreSQL, Grafana, AWS Athena

Programming Languages:
Python, Java

Project: Stream Metrics as a Service
Role: Data Engineer (2022)

Project Overview:
The project involved real-time analysis of data stream events and visualization of metrics on Grafana dashboards. These analytics supported critical business decisions for Discovery.

Roles and Responsibilities:

  • Designed and implemented a real-time data pipeline from scratch using Apache Flink.
  • Conducted event stream analysis and provided real-time insights visualized on Grafana dashboards.
  • Deployed the Flink application using AWS Kinesis Data Analytics service for scalable and managed streaming analytics.

Technologies Used:
Apache Flink, AWS Kinesis Data Analytics, Grafana

Programming Language:
Java

Data Engineer

Infosys LTD
08.2019 - 12.2021

Project:

Role: Data Engineer (Jan 2020- Dec 2021)

Project Overview:
The project focused on the Extraction, Loading, and Transformation (ELT) of data from diverse sources, including Oracle databases, flat files, JSON, CSV, Avro, and Parquet files, into a Data Warehouse. The processed data was then populated into Hive tables for generating reports and supporting business intelligence activities.

Roles and Responsibilities:

  • Data Extraction and Ingestion: Extracted data from Oracle databases and various file formats (CSV, Avro, Parquet, etc.) provided by the data ingestion team.
    Ingested the extracted data into EMR clusters using Sqoop and AWS S3.
  • Data Transformation and Loading:Transformed data in the staging layer using PySpark and loaded it into external Hive tables stored on AWS S3.
  • Workflow Design and Development:Designed and implemented production workflows using Apache Airflow with technologies such as Python, Unix, Hive, PySpark, and Sqoop.
  • AWS Automation and API Development:Developed Python APIs for AWS authentication (Cerebrus and Boto), EMR cluster management (spin-up and termination), and AWS S3 storage operations.
  • Requirement Gathering and Technical Solutions:Collaborated with clients to gather application requirements and provided tailored technical solutions.
  • Production Support:Provided production support for Nike Digital Sales and Analytics, including resolving critical P1 incidents and conducting root cause analysis for failures.
    Addressed job failures in production and QA environments, ensuring operational issues were resolved promptly.

Technologies Used:
Spark, Hive, Python, Unix, Sqoop, MySQL

Tools:
GitHub, PyCharm, Eclipse, MobaXterm, PuTTY

Data Engineer

Infosys LTD
08.2019 - 12.2021

Project:

Role: Data Engineer (August 2019 to Jan 2020)

Project Overview:
The project involved analyzing IPL datasets to extract valuable insights and generate reports on player and team performance. Key analyses included identifying the best batsman, bowler, and fielder for a season and predicting future events.

Roles and Responsibilities:

  • Data Extraction and Transformation:Extracted, transformed, and loaded data from IPL datasets using source queries and Apache Pig.
    Conducted analyses to determine overall best players based on historical performance.
  • Reporting and Dashboard Creation:Developed detailed reports on season and team performance, highlighting key players.
    Created interactive dashboards using Power BI to visualize insights effectively.

Technologies Used:
Power BI, Apache Pig

Tools:
Power BI, PuTTY

Education

Bachelor of Engineering Technology - Information Technology

MIT AOE PUNE
Pune, India
01.2015 - 06.2019

Intermediate -

Priyadarshani English Medium School And Junior College
Pune, India
01.2012 - 01.2015

High School -

Priyadarshani English Medium School And Junior College
Pune, India
01.2010 - 01.2012

Skills

Apache Flink

Apache Spark

AWS

Python

Java

C

MYSQL

undefined

Extra-Curricular Activities

Participated in Nakshatra singing competition (2015 and 2019) Volunteered for EXCEED events (2016-2017) Won first prize in skill bugs sketching competition

Personal Information

  • Date of Birth: 09/08/96
  • Gender: Male
  • Marital Status: Unmarried

Disclaimer

01/07/21, Pune

Accomplishments

  • Awarded 'Rockstar Rookie' in 2022 at Velotio Technologies, recognizing outstanding contributions and impact within my first year
  • Secured second prize for the best project, 'Smart System to Guide Patients,' in an inter-department competition, showcasing innovative problem-solving and teamwork

Software

Java

Python

Maven

Kubernetes

Docker

Git

Github

Databricks

DBT

Grafana

Interests

Singing

Sketching

Painting

Fitness

Certification

Core Java

Singing

I have a deep passion for singing, which allows me to express myself creatively and unwind after a busy day. Whether it’s practicing classical pieces, exploring contemporary genres, or simply humming my favorite tunes, singing has always been a source of joy and inspiration. It also serves as a great way to connect with others who share a love for music.

Timeline

Python Developer

03-2024

Core Java

02-2024

Data Engineer

Velotio technologies
12.2021 - Current

Data Engineer

Infosys LTD
08.2019 - 12.2021

Data Engineer

Infosys LTD
08.2019 - 12.2021

Bachelor of Engineering Technology - Information Technology

MIT AOE PUNE
01.2015 - 06.2019

Intermediate -

Priyadarshani English Medium School And Junior College
01.2012 - 01.2015

High School -

Priyadarshani English Medium School And Junior College
01.2010 - 01.2012
Sandeep ChauhanSoftware Engineer