Data Engineer with 3.5 years of experience building scalable batch and real-time data pipelines across AWS and Azure Databricks. Skilled in PySpark, Apache Spark, Delta Lake, and Kafka, with hands-on experience integrating data from MongoDB, PostgreSQL, Salesforce, and SOAP APIs into modern lakehouse architectures. Experienced in automating deployments using Terraform and Databricks Asset Bundles, and optimizing pipelines to significantly reduce runtime and compute cost. Proficient with MLflow for model training, hyperparameter tuning, and production deployment
Overview
3
3
years of professional experience
4
4
Certifications
Work History
Data Engineer
Zf Digital Solutions India Private Limited
Bangalore
08.2022 - Current
Contributed to the design and development of real-time streaming pipelines using Azure Databricks, PySpark, Delta Lake, and MLflow to process 54M+ records/day, improving pipeline runtime from 8 hours to 35 minutes and reducing compute costs by 93%.
Developed and optimized warehouse-to-lake ingestion workflows with incremental processing, improving execution time from 4 hours to 20 minutes and reducing compute usage by approximately 92%.
Built and maintained AWS S3–based ingestion and Salesforce Bulk API integration pipelines, improving enrichment and synchronization performance and reducing job duration from 9 hours to 5 minutes (approximately 99% improvement).
Implemented a VIN-based enrichment feature by grouping vehicles using the first 8 VIN characters and performing lookup-based model assignment, enriching over 10,000 vehicles while eliminating dependency on external real-time APIs.
Developed ML model training and deployment workflows using MLflow, including automated hyperparameter tuning, experiment tracking, model registry integration, and production deployment.
Built ingestion pipelines for MongoDB, PostgreSQL, Salesforce, and SOAP APIs, ensuring reliable schema management and high-quality incremental ingestion into the lakehouse.
Developed Apache Kafka streaming pipelines using PySpark to ingest and process real-time events and make them available for downstream analytics and reporting teams.
Worked with Unity Catalog to manage permissions, lineage, auditing, and centralized governance across the lakehouse environment.
Automated infrastructure deployments using Terraform for AWS resources and implemented CI/CD workflows with Databricks Asset Bundles to standardize environment configuration and deployment processes.
Education
M.Tech - Data Science
Amrita Vishwa Vidyapeetam
Bangalore,India
07-2022
B.Tech - Computer Science
Ace Engineering College
Hyderabad,India
05-2019
Skills
Aws
Hadoop
Apache Spark
Sql
Azure Databricks
Kafka
Pyspark
Python
Git
Mlflow
Databricks Asset Bundles
MongoDb
Matplotlib
Scikit-Learn
Certification
Databricks Certified Data Engineer Associate - Databricks
Lead Data Integration Engineer at Reveleer Digital Solutions India Private LimitedLead Data Integration Engineer at Reveleer Digital Solutions India Private Limited
Senior Executive - Heat Treatment at ZF Rane Automotive India Private Limited ( Formerly Known As Rane TRW Steering Private Limited. )Senior Executive - Heat Treatment at ZF Rane Automotive India Private Limited ( Formerly Known As Rane TRW Steering Private Limited. )
Senior Software Engineer at Baker Hughes Oilfield Digital Solutions India Private LimitedSenior Software Engineer at Baker Hughes Oilfield Digital Solutions India Private Limited