Results-driven Data Engineer with expertise in Apache Spark and Azure Data Factory at Tata Consultancy Services. Successfully designed scalable ETL pipelines, enhancing data quality and achieving 20% runtime improvements. Proficient in dimensional data modeling and committed to maintaining data integrity through effective troubleshooting and collaboration.
Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.
Overview
6
6
years of professional experience
Work History
Data Engineer
Tata Consultancy Services
11.2021 - Current
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Conducted extensive troubleshooting to identify root causes of issues and implement effective resolutions in a timely manner.
Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
.Designed and deployed scalable ETL pipelines using Azure Data Factory, Databricks, and PySpark for batch and real-time processing.
Delivered data to downstream BI layers following Medallion Architecture (Bronze, Silver, Gold).
Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
Implemented incremental data ingestion with Databricks Autoloader and Spark Structured Streaming, supporting real-time analytics use cases.
Achieved optimized performance and checkpointing for fault tolerance.
Built and maintained Slowly Changing Dimensions (SCD Type 1 & 2) using PySpark, automating change tracking for critical dimension tables.
Integrated into the ETL pipeline as reusable SCD framework.
Ingested structured and semi-structured data (CSV, JSON, Parquet) from Azure Data Lake Storage Gen2 using Service Principal Authentication.
Ensured secure access using Key Vault-integrated secrets.
Developed DBT (Data Build Tool) models for SQL-based transformation and lineage tracking in the lakehouse environment.
Implemented model dependencies and documented data flows.
Created and maintained Databricks Unity Catalog for centralized governance of datasets, tables, and users.
Enabled fine-grained access control and audit tracking.
Leveraged Apache Spark architecture understanding (Driver, Executors, DAG) to debug and optimize long-running jobs.
Achieved 20–30% runtime improvements through tuning.
Configured and managed Databricks Jobs and Workflows for production pipelines, using dbutils, widgets, and dynamic parameters.
Enabled conditional logic and scheduling across pipelines.
Associate Consultant
Tata Consultancy Services
09.2019 - 10.2021
Maintained existing applications and provided 24/7 support as L1 production support
Resolved issues based on SLA and monitored internal tracker for new issues
Provided deployment and migration support
Created SQL queries as per client requirements and interacted with L2 & L3 teams
Maintained data pipelines for processing and transforming large-scale datasets
Managed deployments using Jenkins and worked on code modifications in GIT
Assistant Delivery Manager at Tata Consultancy Services, Global Shared ServicesAssistant Delivery Manager at Tata Consultancy Services, Global Shared Services