Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
NEERAJ YADAV

NEERAJ YADAV

Data Engineer
Pune,MH

Summary

Experienced Databricks Engineer with strong proficiency in PySpark, Python, Spark SQL, and Delta Lake, specializing in building scalable data pipelines and data products on Azure. Hands-on experience integrating SAP source systems, developing Azure Data Factory pipelines, and implementing efficient ETL and CDC frameworks to deliver reliable, business-ready datasets. Adept at optimizing data workflows, supporting large-scale data migrations, and collaborating with cross-functional teams to translate business requirements into robust data solutions.

Overview

4
4
years of professional experience
3
3
Certificates

Work History

Azure Databricks Engineer

BECTON, DICKINSON AND COMPANY (INC.)
05.2025 - 01.2026
  • Played the role of databricks engineer in a GA Customer Portal Facing project .
  • Designed and developed end-to-end data pipelines in Databricks to integrate data from multiple source systems including SAP ECC (Everest) and JDE Oracle (JDEINIT).
  • Extracted and transformed data from Sales Order, Billing, and Delivery tables, implementing complex joins to build accurate and meaningful business datasets.
  • Built 5 data products to support customer-facing business use cases, including 2 standard (vanilla) pipelines and 3 complex pipelines involving multi-table joins, advanced filtering, and derived field calculations.
  • Implemented business-driven transformation logic using PySpark and Spark SQL to calculate multiple metrics and fields required for downstream analytics consumption.
  • Designed and scheduled 4-hourly incremental data refresh pipelines to ensure timely and reliable data delivery aligned with business SLAs.
  • Implemented Change Data Capture (CDC) using Delta Lake framework to capture and process only the latest data changes, reducing data movement and improving processing efficiency.
  • Developed reusable PySpark scripts to publish curated datasets to ADLS Gen2 containers, enabling consumption by business users and downstream applications.
  • Ensured data accuracy, consistency, and reliability across all pipelines by handling dependencies, validations, and schema alignment.
  • Independently owned the design, development, testing, and deployment of all Databricks jobs within the project lifecycle.
  • Applied skills: Azure Databricks, Azure storage, GoAnywhere

Azure Databricks Engineer

BECTON, DICKINSON AND COMPANY (INC.)
03.2024 - 05.2025
  • Played the role of databricks engineer in a project for Apollo Project that involved processing & transformations, creation of new dataproduct.
  • Engineered ingestion pipelines to load structured files from Azure Data Lake Storage (ADLS) into Delta Lake Bronze layer using Apache Spark and Databricks Notebooks.
  • Enabled bidirectional data movement between relational tables and ADLS containers for seamless integration.
  • Designed and implemented ETL workflows to cleanse, normalize, and enrich raw data in the Silver layer using PySpark and Delta Live Tables (DLT).
  • Developed Gold layer datasets optimized for Power BI consumption, supporting advanced analytics and forecasting.
  • Published curated Gold datasets back to ADLS Gen2 for downstream consumption by the Power BI business intelligence team.
  • Supported predictive modeling and reporting use cases by ensuring high data fidelity and availability.
  • Conducted comprehensive unit testing and data validation including:Row count reconciliation,Data quality checks,Duplicate detection,Schema validation
  • Documented all test scenarios and edge cases in a structured format for audit and reproducibility.
  • Authored detailed technical documentation covering pipeline logic, data lineage, and test cases.
  • Collaborated with cross-functional teams including data engineers, BI analysts, and forecasting teams to align on data requirements and delivery timelines.
  • Applied skills: Azure Databricks, Azure storage

Jr Data Engineer

Colas Group, France
07.2022 - 02.2024
  • Played the role of data engineer in a migration project for Colas Group that involved processing & transformations of complex datasets.
  • Designing a comprehensive data pipeline to seamlessly transfer data between sources including Blob Storage, Azure SQL Managed Instances tables, ADLS, and File System, leveraging Azure Data Factory.
  • Creating Databricks notebooks by analyzing stored procedures in SQL Server Management Studio, converting TSQL to PySql for loading data into Delta Lake and Azure Managed Instance tables.
  • Developed an SSDT project for seamless database migration between environments.
  • Performing data testing post-loading into tables, comparing it with higher environment, conducting quality checks, and establishing a Databricks framework for streamlined testing processes.
  • Enhanced performance and incorporated new features into an existing SSRS report, overseeing backend deployment, and rigorously testing workflows and report functionality.
  • Migrating SSDT Project from one server to Other, Creating Databricks Notebook, Pipeline on ADF and reports on SSRS servers.
  • Applied skills: Microsoft Azure Data Factory, Microsoft Azure DevOps, Microsoft Azure Databricks, PySpark, SQL, Microsoft SQL Server Management Studio, Microsoft SQL Server Reporting Services, Microsoft Visual Studio

ETL Developer

The Walt Disney Company, USA
11.2021 - 05.2022
  • Played the role of ETL Developer in a migration project for Disney that involved transformation of files.
  • Collaborated with Disney to streamline CSV file ingestion into S3 buckets, categorized by data types, facilitating efficient data sourcing.
  • Created SQL Scripts for creating DDLs, loading data, extracting data logic from JSON files with proper documentation of test cases, issues encountered and solutions provided.
  • Created a Yaml framework to load the files to different staging buckets in Aws S3 Bucket.
  • Implemented robust data transformation and mapping strategies, ensuring optimal alignment between source CSV files and target Snowflake tables.
  • Orchestrated seamless data migration from Teradata to Snowflake via ETL processes, Glue jobs, and batch jobs, preserving data integrity and accuracy.
  • Enforced data quality measures, including deduplication and versioning techniques, to maintain a consistent and reliable dataset in Snowflake.
  • Established meticulous monitoring, testing, and documentation protocols to validate the migration's success and facilitate future troubleshooting.
  • Applied skills: AWS Glue, Amazon S3, GitHub, Snowflake, SQL, UC4

Education

Bachelors - Computer Engineering

Pimpri Chinchwad College of Engineering And Research
Pune
05.2021

Skills

PROGRAMMING LANGUAGES: PySQL,SQL, Python

Certification

Databricks Certified Data Engineer Professional

Timeline

Databricks Certified Data Engineer Professional

01-2026

Databricks Certified Data Engineer Associate

12-2025

Azure Databricks Engineer

BECTON, DICKINSON AND COMPANY (INC.)
05.2025 - 01.2026

Azure Databricks Engineer

BECTON, DICKINSON AND COMPANY (INC.)
03.2024 - 05.2025

Microsoft Certified: Azure Data Fundamentals

12-2023

Jr Data Engineer

Colas Group, France
07.2022 - 02.2024

ETL Developer

The Walt Disney Company, USA
11.2021 - 05.2022

Bachelors - Computer Engineering

Pimpri Chinchwad College of Engineering And Research
NEERAJ YADAVData Engineer