Summary
Overview
Work History
Education
Skills
Additional Information - Certificates
Timeline
Generic

Tanzeel Khan

Bangalore

Summary

Data Engineer at Walmart specializing in designing and building end-to-end, large-scale ETL pipelines and data workflows with a strong focus on scalability and reliability. Expert in Apache Spark and Airflow on Google Cloud Platform (GCP), with the ability to quickly adapt to AWS and Azure environments. Recognized for reducing infrastructure costs and improving pipeline throughput through Spark performance tuning and resource right-sizing. Passionate about transforming complex datasets into analytics-ready assets that accelerate data-driven decision-making.

Overview

3
3
years of professional experience

Work History

Data Engineer

Walmart Global Tech, India
07.2022 - Current
  • Designed and optimized end-to-end, large-scale ETL pipelines to deliver real-time inventory, order, and transportation data to Walmart’s Order Sourcing Engine, enabling faster and more accurate fulfillment decisions across thousands of stores and fulfillment centers.
  • Implemented ETL pipelines and data workflows to support Walmart’s Replenishment System, which generates daily plans indicating product quantities and locations for restocking. Processed large-scale inventory and sales data to enable accurate, timely replenishment decisions across stores and fulfillment centers
  • Developed high-throughput ingestion jobs in Spark, managing terabyte-scale datasets with optimized partitioning techniques.
  • Increased data processing efficiency by 30% through automated ETL workflows orchestrated by Airflow.
  • Enhanced resource utilization of Dataproc cluster by 25% with fine-tuned Spark job configurations in distributed environments.
  • Optimized SQL queries and database schemas for performance improvements in data retrieval operations.
  • Collaborated with data scientists and analysts to understand data needs and implement appropriate data models and structures.
  • Refactored an entire Java-based ETL job to a pure Spark-native pipeline by removing Java object layers and leveraging Spark transformations, resulting in a 28% reduction in job runtime and improved resource utilization, advanced memory tuning, and GC settings.
  • Delivered scalable Java UDFs that plugged seamlessly into Spark jobs, handling logic that native functions couldn’t support.
  • Automated data quality checks and error handling processes to ensure the integrity and reliability of datasets
  • Developed data pipelines to streamline data collection processes.
  • Wrote and coded logical and physical database descriptions, specifying identifiers of database to management systems.
  • Configured and maintained cloud-based data infrastructure on Google Cloud Platform (GCP), with the ability to quickly adapt to AWS and Azure, to enhance data storage and computation capabilities

Education

B.Tech - CSE

Chandigarh Group of Colleges
Chandigarh
06-2022

Skills

  • Apache Spark
  • Apache Airflow
  • SQL
  • Pyspark
  • Java
  • Python
  • Jupyter Notebook
  • Google Cloud Platform
  • BigQuery
  • Google Cloud Storage
  • Dataproc
  • Big data

Additional Information - Certificates

Received certificate (Bravo Award) of appreciation 3 times along with monetary reward from Manager and director for driving cost-cutting initiatives through Spark job optimization and efficient resource allocation, resulting in significant operational savings.

Timeline

Data Engineer

Walmart Global Tech, India
07.2022 - Current

B.Tech - CSE

Chandigarh Group of Colleges
Tanzeel Khan