Summary
Overview
Work History
Education
Skills
Technical Experience
Certification
Timeline
Generic

Bhakti Padwal

Pune

Summary

Accomplished Azure Data Engineer with 7.5 years of experience in designing and optimizing scalable Big Data solutions. Expertise in Azure Data Factory, Azure Databricks, and Azure Data Lake Gen2, along with programming skills in Python, PySpark, and MySQL. Developed ETL pipelines and enterprise-grade data warehousing solutions that enhance data-driven decision-making and operational efficiency.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Senior Software Engineer

Tech Mahindra
Pune
07.2023 - Current
  • Designed & developed ETL pipelines in Azure Data Factory (ADF) for PII data masking which reduced manual efforts by 90%.
  • Optimized ETL pipelines in Azure Databricks to enhance data transformation efficiency.
  • Developed PySpark notebooks for streamlined data transformation and analysis, improving processing time.
  • Process large datasets efficiently using PySpark, Delta Lake, and SQL to ensure high performance and data accuracy.
  • Integrate Databricks with PIM (Product Information Management) system using MuleSoft APIs to update and synchronize product data.
  • Monitored and troubleshot API requests and responses, reducing error rates and improving data synchronization speed.
  • Ensure data quality and validation before sending data to PIM, reducing failures in downstream systems.
  • Collaborate with cross-functional teams including PIM specialists, API developers, and business analysts to ensure seamless data flow.
  • Implemented password management using Azure Key Vault.
  • Automated data handling workflows to continuously mask new records and maintain compliance.
  • Successfully secured and anonymized sensitive customer data, ensuring full GDPR compliance.

Software Engineer

Cybage Software
Pune
07.2022 - 07.2023
  • Enhanced data pipeline for analyzing user engagement and content performance on media streaming platform.
  • Executed data ingestion and transformation with Azure Data Factory and PySpark on Databricks, streamlining daily log and metadata processing.
  • Performed data cleaning, deduplication and joined datasets to derive meaningful insights like watch time and content completion rate.
  • Validated data accuracy and enhanced pipeline efficiency through collaboration with senior data engineers and analysts.
  • Developed and maintained technical documentation for data workflows, table structures, and business logic to facilitate project clarity and handover readiness.

Lead Software Engineer

Persistent Systems
Pune
04.2022 - 07.2022
  • Completed internal trainings on Databricks, Python, and Delta Lake, enhancing readiness for client projects.
  • Completed training on Azure Data Factory, Azure Synapse, Data Warehouse, and GitHub, strengthening cloud data pipeline skills.
  • Led software development projects using Agile methodologies and best practices.
  • Collaborated with cross-functional teams to define project requirements and deliver effective solutions.
  • Mentored junior engineers, enhancing their technical skills and knowledge sharing.

Big Data Developer

Infosys Limited
Hyderabad
03.2020 - 03.2022
  • Led migration of legacy mainframe batch processing to scalable big data solutions using Apache Spark and Hadoop HDFS, enhancing processing capabilities.
  • Reduced batch processing duration from over 6 hours on mainframe to under 1 hour, significantly improving data availability.
  • Ingested mainframe output files into HDFS, enabling parallel data processing in Spark.
  • Migrated and validated data schemas using Avro/Parquet formats, optimizing storage and ensuring compatibility across Spark jobs.
  • Used Hive external tables to provide a SQL interface for legacy teams while data resides on HDFS.
  • Ensured data consistency and lineage tracking across mainframe files and Spark output using hashing and audit columns, enhancing data integrity.

Testing Executive

Infosys Limited
Hyderabad
06.2018 - 02.2020
  • Executed database testing and schema verification using PGAdmin, ensuring data integrity and consistency across tables, which supported reliable data operations.
  • Created and executed SQL queries to verify test data, check joins and perform validations on relational schema.
  • Validated frontend data changes triggered by back-end actions, confirming alignment between user interface and underlying database.
  • Wrote DML queries for test data creation and cleanup during functional testing cycles.
  • Partnered with developers to identify and report database-related defects through JIRA, improving defect tracking and facilitating faster resolutions.

Education

B.Sc. Computer Science -

Fergusson College
Pune
12-2018

Skills

  • Python
  • PySpark programming
  • Azure Databricks
  • Azure Data Factory (ADF)
  • Azure Data Lake Storage
  • Delta Lake
  • Azure SQL Database
  • MySQL
  • Data Warehousing
  • Azure DevOps
  • Version Control(Git)
  • Azure Key Vault

Technical Experience

  • Experienced in working with PySpark using Spark Structured APIs Dataframes, and Spark SQL.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Spark Dataframes, Driver Node, Worker Node, Stages, Executors, Jobs and Tasks.
  • Understanding of Hadoop Ecosystem including HDFS, Name Node, Data Node and MapReduce programming paradigm.
  • Good understanding of various Spark optimization techniques.
  • Experienced in working with Azure Data Factory pipelines, monitoring and triaging the failures and configuring triggers.
  • Experienced in developing PySpark notebooks for data transformation and analysis.
  • Experienced in scheduling and monitoring Databricks workflows for batch data.
  • Experienced in working with Delta Lake, Delta live tables, Unity Catalogue in Azure Databricks.
  • Experienced in working with Azure Databricks using PySpark with different Databricks utilities (File system, Notebook, Widget etc.).
  • Understanding of Databricks Unity Catalogue.
  • Hands-on experience on sending data to target systems using Databricks via REST APIs using MuleSoft.
  • Experienced in working with Azure Data Lake Storage Gen 2 and Azure Blob Storage.
  • Good understanding of Data Warehousing concepts.
  • Hands-on experience in writing optimized SQL queries and Stored Procedures for retrieving and analyzing the data.
  • Hands-on experience on Hive for creating and managing Hive tables in Hadoop, writing Hive queries for ad hoc data analysis.
  • Good understanding of different Optimization techniques such as Hive Partitioning, Query Level Optimization and Bucketing in Hive.
  • Hands-on experience on data transformations and end-to-end data validation for ETL using complex SQL.
  • Worked with variety of Big Data file formats such as CSV, JSON, XML, parquet, etc.
  • Good understanding of CI/CD process.

Certification

  • Databricks Certified Data Engineer Associate
  • Microsoft DP 203 – Data Engineer Associate
  • Microsoft DP 900 – Data Fundamentals
  • Microsoft AZ 900 – Azure Fundamentals
  • Microsoft AI 900 – AI Fundamentals
  • AWS Cloud Practitioner

Timeline

Senior Software Engineer

Tech Mahindra
07.2023 - Current

Software Engineer

Cybage Software
07.2022 - 07.2023

Lead Software Engineer

Persistent Systems
04.2022 - 07.2022

Big Data Developer

Infosys Limited
03.2020 - 03.2022

Testing Executive

Infosys Limited
06.2018 - 02.2020

B.Sc. Computer Science -

Fergusson College
Bhakti Padwal