Data Engineer at Walmart specializing in designing and building end-to-end, large-scale ETL pipelines and data workflows with a strong focus on scalability and reliability. Expert in Apache Spark and Airflow on Google Cloud Platform (GCP), with the ability to quickly adapt to AWS and Azure environments. Recognized for reducing infrastructure costs and improving pipeline throughput through Spark performance tuning and resource right-sizing. Passionate about transforming complex datasets into analytics-ready assets that accelerate data-driven decision-making.
Overview
3
3
years of professional experience
Work History
Data Engineer
Walmart Global Tech, India
07.2022 - Current
Designed and optimized end-to-end, large-scale ETL pipelines to deliver real-time inventory, order, and transportation data to Walmart’s Order Sourcing Engine, enabling faster and more accurate fulfillment decisions across thousands of stores and fulfillment centers.
Implemented ETL pipelines and data workflows to support Walmart’s Replenishment System, which generates daily plans indicating product quantities and locations for restocking. Processed large-scale inventory and sales data to enable accurate, timely replenishment decisions across stores and fulfillment centers
Developed high-throughput ingestion jobs in Spark, managing terabyte-scale datasets with optimized partitioning techniques.
Increased data processing efficiency by 30% through automated ETL workflows orchestrated by Airflow.
Enhanced resource utilization of Dataproc cluster by 25% with fine-tuned Spark job configurations in distributed environments.
Optimized SQL queries and database schemas for performance improvements in data retrieval operations.
Collaborated with data scientists and analysts to understand data needs and implement appropriate data models and structures.
Refactored an entire Java-based ETL job to a pure Spark-native pipeline by removing Java object layers and leveraging Spark transformations, resulting in a 28% reduction in job runtime and improved resource utilization, advanced memory tuning, and GC settings.
Delivered scalable Java UDFs that plugged seamlessly into Spark jobs, handling logic that native functions couldn’t support.
Automated data quality checks and error handling processes to ensure the integrity and reliability of datasets
Developed data pipelines to streamline data collection processes.
Wrote and coded logical and physical database descriptions, specifying identifiers of database to management systems.
Configured and maintained cloud-based data infrastructure on Google Cloud Platform (GCP), with the ability to quickly adapt to AWS and Azure, to enhance data storage and computation capabilities
Education
B.Tech - CSE
Chandigarh Group of Colleges
Chandigarh
06-2022
Skills
Apache Spark
Apache Airflow
SQL
Pyspark
Java
Python
Jupyter Notebook
Google Cloud Platform
BigQuery
Google Cloud Storage
Dataproc
Big data
Additional Information - Certificates
Received certificate (Bravo Award) of appreciation 3 times along with monetary reward from Manager and director for driving cost-cutting initiatives through Spark job optimization and efficient resource allocation, resulting in significant operational savings.