Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Certification
Languages
Timeline
Generic
Sachin Soni

Sachin Soni

Mandi

Summary

Senior Data Engineer with over 5+ years of experience in designing and maintaining scalable data platforms. Expertise in Google Cloud Platform (GCP) , AWS and proficiency in Kafka, Spark, Python, and SQL. GCP-certified Professional Data Engineer known for delivering high-performance data pipelines that enhance analytics and business intelligence capabilities.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Landis gyr
Noida
06.2024 - Current

Grid Apps – Real-Time Streaming Pipeline

  • Independently designed and developed a real-time streaming pipeline to ingest smart meter data using Google Cloud Pub/Sub, processing 10M+ events per day.
  • Implemented robust data transformation logic to parse, unpack, and normalize JSON payloads, improving data quality and consistency by 30%.
  • Loaded curated, analytics-ready datasets into BigQuery, supporting near real-time analytics and reporting use cases.
  • Orchestrated and automated pipeline execution using Cloud Composer (Apache Airflow), achieving 99.9% pipeline reliability and low-latency processing.
  • Enabled seamless integration with downstream analytics and reporting systems, reducing reporting latency from hours to minutes.
    Tools: Google Cloud Pub/Sub, BigQuery, Cloud Composer (Airflow), SQL, GCP Routines.

Data Pipelines – GCP Analytics Enablement

  • Designed and maintained scalable batch and streaming data pipelines using Dataflow and Dataform, aligned with evolving business requirements.
  • Orchestrated and scheduled containerized workloads (Docker) using Cloud Composer, ensuring dependable and timely execution across environments.
  • Collaborated with cross-functional analytics and product teams to deliver high-quality datasets powering real-time dashboards and business insights.
  • Optimized SQL transformations and data models, improving query performance by 25–30% and enhancing maintainability of analytics workflows.
  • Improved pipeline stability and monitoring, contributing to consistent SLA adherence for analytics consumers.
    Tools: Dataflow, Dataform, BigQuery, Cloud Composer (Airflow), SQL, Docker.

Data Engineer

Purchasing Power
Chennai
01.2021 - 06.2024

Pentaho Deduction File Framework (FDRE)

  • Spearheaded the design and development of the FDRE framework, automating deduction file generation across 10+ client-specific formats.
  • Reduced file creation time by 40–50% and manual effort by 60%, significantly improving delivery speed and scalability of client services.

Scala to PySpark Job Migration

  • Led the migration of a legacy Scala-based batch job to a more efficient and maintainable PySparkimplementation.
  • Improved job performance by 35%, reduced runtime failures by 30%, and increased overall data processing capacity.
  • Enhanced debuggability and execution control, resulting in faster issue resolution and improved operational stability.
    Tools: PySpark, Elasticsearch, DSL Queries, Greenplum.

Enterprise Data Pipelines

  • Designed, developed, and maintained end-to-end data pipelines ingesting and transforming multi-terabyte datasets daily from multiple sources into a centralized data warehouse.
  • Ensured 99.9% pipeline reliability, scalability, and high data quality to support downstream analytics and enterprise reporting.
    Tools: Kafka, PySpark, Elasticsearch, Greenplum.

Data Dictionary Automation

  • Developed an automated data dictionary framework using Sphinx, sourcing metadata from Greenplum for 100+ tables.
  • Integrated the solution with EMR and orchestrated deployments using Airflow, delivering a dynamic HTML documentation portal.
  • Reduced analyst dependency and data discovery time by 25–30% for data consumers.

Adobe Data Delta Lake Pipeline

  • Built a scalable ingestion pipeline to load raw Adobe datasets into a Delta Lake architecture.
  • Enabled actionable insights into client visits, login behavior, and order activity, supporting analytics use cases across multiple business teams.

Campaign Data Load Optimization

  • Optimized ingestion of a 3.2 billion-record Responsys campaign dataset across multiple source tables.
  • Leveraged PySpark parallel processing to reduce pipeline load time by 50%, ensuring timely availability of business-critical insights.

Airflow Platform Migration

  • Led the upgrade and migration of Apache Airflow to the latest version.
  • Successfully migrated 100% of production DAGs, resolving dependency and compatibility issues with zero downtime and no disruption to business workflows.

SurveyMonkey API Data Integration

  • Developed a PySpark-based ingestion framework to extract and process data from SurveyMonkey APIs.
  • Applied complex transformations to convert raw API responses into analytics-ready datasets, enabling scalable reporting and analysis.
  • Automated ingestion pipelines to improve data freshness and reduce manual intervention.

Education

Bachelor of Technology - Computer Science And Engineering

Lovely Professional University
Phagwara Punjab
06.2021

Skills

  • Python and PySpark
  • Data pipeline orchestration (Airflow)
  • Cloud services (AWS, GCP) - (EMR, EC2, Lambda, Dataform, Dalaflow, Composer, Artificial Registry)
  • Database management systems (Postgres, Greenplum, Oracle)
  • Big data technologies (Spark, Hadoop, Kafka)
  • Search engine optimization (Elastic Search)
  • Version control systems (Bitbucket, GitLab)
  • Scripting proficiency (Shell Script, JavaScript)
  • Programming languages (C, Scala)
  • Data architecture expertise
  • Data transformation
  • Data modeling
  • Data architecture
  • Analytical skills

Accomplishments

• Kudos Recognition, Landis Gyr, 2025, Awarded for high-impact technical delivery and ownership of complex work.
• Reimagine Award, Purchasing Power, 2023, Contribution to Innovation and Creativity.
• Star Award, Purchasing Power, 2022, Accomplishment of Outstanding Performance.
• Spot Award, Purchasing Power, 2021, Accomplishment of FDRE Project, and Performance in 2021 Q4.
• Student Placement Coordination, 2019-2020, responsible for the hospitality of the company, and conducting the placement drive smoothly. Have to invigilate during the placement examinations.

Certification

• Google Generative AI Certification — Google Cloud Platform (GCP), Jan. 2026.
• Google Professional Data Engineer — Google Cloud Platform (GCP), May 2025.
• Associate Data Practitioner — Google Cloud Platform (GCP), Mar. 2025.
• Apache Kafka — Intermediate, July 2022.
• Apache Kafka Connect — Beginner, July 2022.
• Taming Big Data with Apache Spark and Python, 2022.
• Crash Course on Python — Google, July 2020.
• Managing Big Data with SQL — Duke University, July 2020.

Languages

  • English, Professional Working Proficiency
  • Hindi, Full Professional Proficiency

Timeline

Senior Data Engineer

Landis gyr
06.2024 - Current

Data Engineer

Purchasing Power
01.2021 - 06.2024

Bachelor of Technology - Computer Science And Engineering

Lovely Professional University
Sachin Soni