Experienced Databricks Engineer with strong proficiency in PySpark, Python, Spark SQL, and Delta Lake, specializing in building scalable data pipelines and data products on Azure. Hands-on experience integrating SAP source systems, developing Azure Data Factory pipelines, and implementing efficient ETL and CDC frameworks to deliver reliable, business-ready datasets. Adept at optimizing data workflows, supporting large-scale data migrations, and collaborating with cross-functional teams to translate business requirements into robust data solutions.
Overview
4
4
years of professional experience
3
3
Certificates
Work History
Azure Databricks Engineer
BECTON, DICKINSON AND COMPANY (INC.)
05.2025 - 01.2026
Played the role of databricks engineer in a GA Customer Portal Facing project .
Designed and developed end-to-end data pipelines in Databricks to integrate data from multiple source systems including SAP ECC (Everest) and JDE Oracle (JDEINIT).
Extracted and transformed data from Sales Order, Billing, and Delivery tables, implementing complex joins to build accurate and meaningful business datasets.
Built 5 data products to support customer-facing business use cases, including 2 standard (vanilla) pipelines and 3 complex pipelines involving multi-table joins, advanced filtering, and derived field calculations.
Implemented business-driven transformation logic using PySpark and Spark SQL to calculate multiple metrics and fields required for downstream analytics consumption.
Designed and scheduled 4-hourly incremental data refresh pipelines to ensure timely and reliable data delivery aligned with business SLAs.
Implemented Change Data Capture (CDC) using Delta Lake framework to capture and process only the latest data changes, reducing data movement and improving processing efficiency.
Developed reusable PySpark scripts to publish curated datasets to ADLS Gen2 containers, enabling consumption by business users and downstream applications.
Ensured data accuracy, consistency, and reliability across all pipelines by handling dependencies, validations, and schema alignment.
Independently owned the design, development, testing, and deployment of all Databricks jobs within the project lifecycle.
Played the role of databricks engineer in a project for Apollo Project that involved processing & transformations, creation of new dataproduct.
Engineered ingestion pipelines to load structured files from Azure Data Lake Storage (ADLS) into Delta Lake Bronze layer using Apache Spark and Databricks Notebooks.
Enabled bidirectional data movement between relational tables and ADLS containers for seamless integration.
Designed and implemented ETL workflows to cleanse, normalize, and enrich raw data in the Silver layer using PySpark and Delta Live Tables (DLT).
Developed Gold layer datasets optimized for Power BI consumption, supporting advanced analytics and forecasting.
Published curated Gold datasets back to ADLS Gen2 for downstream consumption by the Power BI business intelligence team.
Supported predictive modeling and reporting use cases by ensuring high data fidelity and availability.
Conducted comprehensive unit testing and data validation including:Row count reconciliation,Data quality checks,Duplicate detection,Schema validation
Documented all test scenarios and edge cases in a structured format for audit and reproducibility.
Authored detailed technical documentation covering pipeline logic, data lineage, and test cases.
Collaborated with cross-functional teams including data engineers, BI analysts, and forecasting teams to align on data requirements and delivery timelines.
Applied skills: Azure Databricks, Azure storage
Jr Data Engineer
Colas Group, France
07.2022 - 02.2024
Played the role of data engineer in a migration project for Colas Group that involved processing & transformations of complex datasets.
Designing a comprehensive data pipeline to seamlessly transfer data between sources including Blob Storage, Azure SQL Managed Instances tables, ADLS, and File System, leveraging Azure Data Factory.
Creating Databricks notebooks by analyzing stored procedures in SQL Server Management Studio, converting TSQL to PySql for loading data into Delta Lake and Azure Managed Instance tables.
Developed an SSDT project for seamless database migration between environments.
Performing data testing post-loading into tables, comparing it with higher environment, conducting quality checks, and establishing a Databricks framework for streamlined testing processes.
Enhanced performance and incorporated new features into an existing SSRS report, overseeing backend deployment, and rigorously testing workflows and report functionality.
Migrating SSDT Project from one server to Other, Creating Databricks Notebook, Pipeline on ADF and reports on SSRS servers.
Applied skills: Microsoft Azure Data Factory, Microsoft Azure DevOps, Microsoft Azure Databricks, PySpark, SQL, Microsoft SQL Server Management Studio, Microsoft SQL Server Reporting Services, Microsoft Visual Studio
ETL Developer
The Walt Disney Company, USA
11.2021 - 05.2022
Played the role of ETL Developer in a migration project for Disney that involved transformation of files.
Collaborated with Disney to streamline CSV file ingestion into S3 buckets, categorized by data types, facilitating efficient data sourcing.
Created SQL Scripts for creating DDLs, loading data, extracting data logic from JSON files with proper documentation of test cases, issues encountered and solutions provided.
Created a Yaml framework to load the files to different staging buckets in Aws S3 Bucket.
Implemented robust data transformation and mapping strategies, ensuring optimal alignment between source CSV files and target Snowflake tables.
Orchestrated seamless data migration from Teradata to Snowflake via ETL processes, Glue jobs, and batch jobs, preserving data integrity and accuracy.
Enforced data quality measures, including deduplication and versioning techniques, to maintain a consistent and reliable dataset in Snowflake.
Established meticulous monitoring, testing, and documentation protocols to validate the migration's success and facilitate future troubleshooting.