Summary

Overview

Work History

Education

Skills

Websites

Accomplishments

Certification

Languages

Timeline

Sachin Soni

Mandi

Summary

Senior Data Engineer with over 5+ years of experience in designing and maintaining scalable data platforms. Expertise in Google Cloud Platform (GCP) , AWS and proficiency in Kafka, Spark, Python, and SQL. GCP-certified Professional Data Engineer known for delivering high-performance data pipelines that enhance analytics and business intelligence capabilities.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Landis gyr

Noida

06.2024 - Current

Grid Apps – Real-Time Streaming Pipeline

Independently designed and developed a real-time streaming pipeline to ingest smart meter data using Google Cloud Pub/Sub, processing 10M+ events per day.
Implemented robust data transformation logic to parse, unpack, and normalize JSON payloads, improving data quality and consistency by 30%.
Loaded curated, analytics-ready datasets into BigQuery, supporting near real-time analytics and reporting use cases.
Orchestrated and automated pipeline execution using Cloud Composer (Apache Airflow), achieving 99.9% pipeline reliability and low-latency processing.
Enabled seamless integration with downstream analytics and reporting systems, reducing reporting latency from hours to minutes.
Tools: Google Cloud Pub/Sub, BigQuery, Cloud Composer (Airflow), SQL, GCP Routines.

Data Pipelines – GCP Analytics Enablement

Designed and maintained scalable batch and streaming data pipelines using Dataflow and Dataform, aligned with evolving business requirements.
Orchestrated and scheduled containerized workloads (Docker) using Cloud Composer, ensuring dependable and timely execution across environments.
Collaborated with cross-functional analytics and product teams to deliver high-quality datasets powering real-time dashboards and business insights.
Optimized SQL transformations and data models, improving query performance by 25–30% and enhancing maintainability of analytics workflows.
Improved pipeline stability and monitoring, contributing to consistent SLA adherence for analytics consumers.
Tools: Dataflow, Dataform, BigQuery, Cloud Composer (Airflow), SQL, Docker.

Data Engineer

Purchasing Power

Chennai

01.2021 - 06.2024

Pentaho Deduction File Framework (FDRE)

Spearheaded the design and development of the FDRE framework, automating deduction file generation across 10+ client-specific formats.
Reduced file creation time by 40–50% and manual effort by 60%, significantly improving delivery speed and scalability of client services.

Scala to PySpark Job Migration

Led the migration of a legacy Scala-based batch job to a more efficient and maintainable PySparkimplementation.
Improved job performance by 35%, reduced runtime failures by 30%, and increased overall data processing capacity.
Enhanced debuggability and execution control, resulting in faster issue resolution and improved operational stability.
Tools: PySpark, Elasticsearch, DSL Queries, Greenplum.

Enterprise Data Pipelines

Designed, developed, and maintained end-to-end data pipelines ingesting and transforming multi-terabyte datasets daily from multiple sources into a centralized data warehouse.
Ensured 99.9% pipeline reliability, scalability, and high data quality to support downstream analytics and enterprise reporting.
Tools: Kafka, PySpark, Elasticsearch, Greenplum.

Data Dictionary Automation

Developed an automated data dictionary framework using Sphinx, sourcing metadata from Greenplum for 100+ tables.
Integrated the solution with EMR and orchestrated deployments using Airflow, delivering a dynamic HTML documentation portal.
Reduced analyst dependency and data discovery time by 25–30% for data consumers.

Adobe Data Delta Lake Pipeline

Built a scalable ingestion pipeline to load raw Adobe datasets into a Delta Lake architecture.
Enabled actionable insights into client visits, login behavior, and order activity, supporting analytics use cases across multiple business teams.

Campaign Data Load Optimization

Optimized ingestion of a 3.2 billion-record Responsys campaign dataset across multiple source tables.
Leveraged PySpark parallel processing to reduce pipeline load time by 50%, ensuring timely availability of business-critical insights.

Airflow Platform Migration

Led the upgrade and migration of Apache Airflow to the latest version.
Successfully migrated 100% of production DAGs, resolving dependency and compatibility issues with zero downtime and no disruption to business workflows.

SurveyMonkey API Data Integration

Developed a PySpark-based ingestion framework to extract and process data from SurveyMonkey APIs.
Applied complex transformations to convert raw API responses into analytics-ready datasets, enabling scalable reporting and analysis.
Automated ingestion pipelines to improve data freshness and reduce manual intervention.

Education

Bachelor of Technology - Computer Science And Engineering

Lovely Professional University

Phagwara Punjab

06.2021

Skills

Python and PySpark
Data pipeline orchestration (Airflow)
Cloud services (AWS, GCP) - (EMR, EC2, Lambda, Dataform, Dalaflow, Composer, Artificial Registry)
Database management systems (Postgres, Greenplum, Oracle)
Big data technologies (Spark, Hadoop, Kafka)
Search engine optimization (Elastic Search)
Version control systems (Bitbucket, GitLab)

Scripting proficiency (Shell Script, JavaScript)
Programming languages (C, Scala)
Data architecture expertise
Data transformation
Data modeling
Data architecture
Analytical skills

Websites

linkedin.com/in/-sachinsoni

Accomplishments

• Kudos Recognition, Landis Gyr, 2025, Awarded for high-impact technical delivery and ownership of complex work.
• Reimagine Award, Purchasing Power, 2023, Contribution to Innovation and Creativity.
• Star Award, Purchasing Power, 2022, Accomplishment of Outstanding Performance.
• Spot Award, Purchasing Power, 2021, Accomplishment of FDRE Project, and Performance in 2021 Q4.
• Student Placement Coordination, 2019-2020, responsible for the hospitality of the company, and conducting the placement drive smoothly. Have to invigilate during the placement examinations.

Certification

• Google Generative AI Certification — Google Cloud Platform (GCP), Jan. 2026.
• Google Professional Data Engineer — Google Cloud Platform (GCP), May 2025.
• Associate Data Practitioner — Google Cloud Platform (GCP), Mar. 2025.
• Apache Kafka — Intermediate, July 2022.
• Apache Kafka Connect — Beginner, July 2022.
• Taming Big Data with Apache Spark and Python, 2022.
• Crash Course on Python — Google, July 2020.
• Managing Big Data with SQL — Duke University, July 2020.

Languages

English, Professional Working Proficiency
Hindi, Full Professional Proficiency

Timeline

Senior Data Engineer

Landis gyr

06.2024 - Current

Data Engineer

Purchasing Power

01.2021 - 06.2024

Bachelor of Technology - Computer Science And Engineering

Lovely Professional University

Sachin Soni

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Education

Bachelor of Technology - Computer Science And Engineering

Skills

Websites

Accomplishments

Certification

Languages

Timeline

Senior Data Engineer

Data Engineer

Bachelor of Technology - Computer Science And Engineering

Similar Profiles

Avinash YerramilliAvinash Yerramilli

Srikanth Reddy ChandaSrikanth Reddy Chanda

Amit BhargavaAmit Bhargava

Rajeev RamaduraiRajeev Ramadurai

Gaeyanmayee Meghana JanarajupalliGaeyanmayee Meghana Janarajupalli