Data Engineer with 8.5 years of experience building scalable data pipelines and lakehouse architectures using Spark, Databricks, and Azure. Expert in ETL development, performance tuning, and cost optimization. Strong focus on data quality, security, and delivering business-ready solutions.
Overview
3
3
years of professional experience
Work History
Senior Engineer
Mercedes-Benz Research and Develpment India
Bangalore
01.2023 - Current
Project 1:PartlistDB
Developed scalable ingestion frameworks for high-frequency API sources with built-in fault tolerance, retry logic, and performance optimization.
Implemented GDPR-compliant data pipelines, securing PII and ensuring adherence to global data privacy regulations.
Redesigned data pipelines to process only relevant data and run tasks in parallel, cutting job runtime by 65%.
Project 2: Dynamic Sales Steering
Designed and developed a data application projected to save €2M annually by optimizing business workflows and reducing manual effort.
Optimized PySpark workflows, reducing notebook execution time from 2 hours to 20 minutes through code refactoring and better resource handling.
Reduced infrastructure costs by over 40% (from €17,000 to €10,000/month) through efficient cluster sizing and auto-scaling strategies.
Developed a framework called FMEA to enhance data stability, enforce coding standards, and standardize failure handling across pipelines.
Accelerated scenario creation lifecycle from 1 month to 1 day by automating configuration and enabling dynamic scenario generation.
Advisory Technical Services Specialist
IBM
Bangalore
03.2022 - 12.2022
Developed and orchestrated ETL workflows in Azure Data Factory, integrating multiple on-prem and cloud sources with robust error handling and retries.
Implemented secure data ingestion via jump server, enabling access to restricted enterprise networks while maintaining compliance and data integrity.
Designed reusable Spark-based processing templates for high-volume ETL pipelines, improving delivery speed for new data sources by 50%.
Skills
Big Data: Apache Spark, Databricks, Hive
Cloud Platforms: Microsoft Azure, AWS
Data Lakehouse: Delta Lake, Unity Catalog
Programming: Python, Scala, SQL, PySpark
Orchestration: Azure Data Factory
Databases: PostgreSQL, MySQL, Cosmos DB
Streaming: Kafka, Azure Event Hubs
AI & ML: Basic handling of Large Language Models (LLMs) and foundational AI/ML workflows
Certifications & AI Projects
Certified in Azure Fundamentals, Data Engineering, and AI Fundamentals.