Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Chaitanya Kumar Katepalli

Hyderabad

Summary

Data Engineer with 3.5 years of experience building scalable batch and real-time data pipelines across AWS and Azure Databricks. Skilled in PySpark, Apache Spark, Delta Lake, and Kafka, with hands-on experience integrating data from MongoDB, PostgreSQL, Salesforce, and SOAP APIs into modern lakehouse architectures. Experienced in automating deployments using Terraform and Databricks Asset Bundles, and optimizing pipelines to significantly reduce runtime and compute cost. Proficient with MLflow for model training, hyperparameter tuning, and production deployment

Overview

3
3
years of professional experience
4
4
Certifications

Work History

Data Engineer

Zf Digital Solutions India Private Limited
Bangalore
08.2022 - Current
  • Contributed to the design and development of real-time streaming pipelines using Azure Databricks, PySpark, Delta Lake, and MLflow to process 54M+ records/day, improving pipeline runtime from 8 hours to 35 minutes and reducing compute costs by 93%.
  • Developed and optimized warehouse-to-lake ingestion workflows with incremental processing, improving execution time from 4 hours to 20 minutes and reducing compute usage by approximately 92%.
  • Built and maintained AWS S3–based ingestion and Salesforce Bulk API integration pipelines, improving enrichment and synchronization performance and reducing job duration from 9 hours to 5 minutes (approximately 99% improvement).
  • Implemented a VIN-based enrichment feature by grouping vehicles using the first 8 VIN characters and performing lookup-based model assignment, enriching over 10,000 vehicles while eliminating dependency on external real-time APIs.
  • Developed ML model training and deployment workflows using MLflow, including automated hyperparameter tuning, experiment tracking, model registry integration, and production deployment.
  • Built ingestion pipelines for MongoDB, PostgreSQL, Salesforce, and SOAP APIs, ensuring reliable schema management and high-quality incremental ingestion into the lakehouse.
  • Developed Apache Kafka streaming pipelines using PySpark to ingest and process real-time events and make them available for downstream analytics and reporting teams.
  • Worked with Unity Catalog to manage permissions, lineage, auditing, and centralized governance across the lakehouse environment.
  • Automated infrastructure deployments using Terraform for AWS resources and implemented CI/CD workflows with Databricks Asset Bundles to standardize environment configuration and deployment processes.

Education

M.Tech - Data Science

Amrita Vishwa Vidyapeetam
Bangalore,India
07-2022

B.Tech - Computer Science

Ace Engineering College
Hyderabad,India
05-2019

Skills

  • Aws
  • Hadoop
  • Apache Spark
  • Sql
  • Azure Databricks
  • Kafka
  • Pyspark
  • Python
  • Git
  • Mlflow
  • Databricks Asset Bundles
  • MongoDb
  • Matplotlib
  • Scikit-Learn

Certification

Databricks Certified Data Engineer Associate - Databricks

Timeline

Data Engineer

Zf Digital Solutions India Private Limited
08.2022 - Current

M.Tech - Data Science

Amrita Vishwa Vidyapeetam

B.Tech - Computer Science

Ace Engineering College
Chaitanya Kumar Katepalli