Summary
Overview
Work history
Education
Skills
Certification
Awards, Accomplishments, and Honors
Timeline
Generic

Paurnima Gadhave

Pune,,India

Summary

  • Lead Data Engineer with 9+ years of experience building scalable data platforms, enterprise data warehouses, and cloud-based analytics solutions.
  • Expert in designing and implementing end-to-end data architectures on Microsoft Azure using Azure Data Factory, Azure Databricks, Azure Fabric, and ADLS Gen2 for enterprise-grade analytics and reporting.
  • Proficient in building Lakehouse architectures using Delta Tables and Medallion Architecture (Bronze, Silver, Gold) for efficient and scalable data processing.
  • Advanced expertise in big data technologies including Apache Spark, PySpark, and Spark SQL for large-scale distributed data processing.
  • Hands-on experience developing batch and real-time data pipelines using orchestrating workflows with Azure Data Factory.
  • Strong focus on performance optimization, data quality, and building reliable, scalable, and secure data solutions.
  • Proven ability to collaborate with business stakeholders and cross-functional teams to deliver high-impact data-driven solutions.

Overview

9
9
years of professional experience

Work history

Lead Analyst

CGI
PUNE, INDIA
2020.10 - Current

Michelin– AIM Customer360:

  • The project involves building an end-to-end data ingestion, processing, and analytics solution using Azure cloud services.
  • Designed and implemented Lakehouse architecture using Medallion Framework on Azure Databricks and Azure Data Lake Gen2.
  • Developed data ingestion workflows using Azure Data Factory and Logic Apps to automate file movement (CSV/Excel) into the Raw folder of the Data Lake. Data was sourced from multiple platforms, including SharePoint, REST APIs, Salesforce (SFDC), MySQL, SFTP, Dremio, Blob Storage and shared Mailbox.
  • Built scalable data transformation processes using PySpark and Spark SQL in Azure Databricks to cleanse, standardize, and model data into curated layers of structured and semi-structured data.
  • Utilized Azure DevOps for CI/CD and automated deployment of notebooks and pipelines.
  • Improved data pipeline performance through Spark optimization and query tuning techniques also leveraged the Dremio tool by creating views to enhance Power BI dashboard efficiency and ensure smooth data consumption for business stakeholders.
  • Generated datasets for Power Apps team based on their requirements and moved those datasets to Dataverse to access them.
  • Optimized production data processing jobs in Databricks, reducing execution time by ~30%.
  • By automating many reports, we have saved 70% of manual efforts.
  • Led migration from Azure Data Lake Gen1 to Azure Data Lake Gen2 by modifying notebooks and data factory pipelines to make ADLS Gen2 dependable. Additionally, I supported Power BI developers in modifying dashboards to move data source locations to ADLS Gen2.

Michelin – DMINT:

  • Gathered and analyzed requirements in order to convert business requirements into scalable data engineering solutions.
  • Designed and implemented Lakehouse architecture using Medallion Framework on Azure Databricks and Azure Data Lake Gen2.
  • Developed ingestion workflows using Azure Data Factory to load data from SharePoint, REST APIs, Salesforce (SFDC) and Logic Apps from SharePoint.
  • Implemented data transformation logic using PySpark and Spark SQL in Azure Databricks and used Parquet, CSV, Excel and Delta format to store the data.
  • Developed interactive Excel reports enabling users to validate data and instantly assess the impact of manually entered inputs by integrating data from Azure Data Lake and multiple sources, ensuring accurate, consolidated reporting and improved data visibility.
  • Created Power BI dashboards on curated data, delivering actionable business insights with improved performance and managed CI/CD deployments using Azure DevOps pipelines.
  • Improved dashboard performance by migrating some transformations from Power BI to Databricks. Which helped to reduce dashboard refresh time by ~30-40% and model size by ~45-55%.

Software Engineer

MINDTRAIL TECHNOLOGY
Pune
2017.01 - 2020.10

Menards

  • Designed and developed daily and monthly pacing and spend reporting solutions on the Microsoft Azure platform.
  • Built scalable ETL pipelines using Azure Data Factory to orchestrate Spark jobs and trigger workflows based on new data availability in ADLS Gen2.
  • Leveraged Azure Databricks (serverless) to develop and execute PySpark/Spark SQL jobs for large-scale data processing.
  • Implemented Delta Lake for intermediate data storage, enabling efficient data flow between jobs and supporting debugging and data validation.
  • Integrated vendor-specific data from Azure MySQL DB with input datasets containing impressions, clicks, and spend metrics to generate enriched datasets.
  • Developed interactive Power BI dashboards on top of curated data to provide actionable insights into campaign performance and spending trends.

Digital Twin

  • Designed and developed a predictive analytics solution to forecast the probability distribution of wind power generation for wind turbines.
  • Built end-to-end data pipelines using Azure Data Factory and Azure Databricks to process both historical and real-time data.
  • Implemented machine learning models in Azure Databricks using PySpark to predict power generation based on sensor data, applying multiple algorithms to improve model accuracy.
  • Performed data preparation, feature engineering, and model training for scalable and reliable predictions.
  • Integrated historical data from MSSQL and real-time streaming data via Kafka (RESCA server), storing and processing data in Azure Cosmos DB.

Wendy’s – Customer Review Analytics & NLP

  • Developed an NLP-based analytics solution to process and analyze ~20 million unstructured customer reviews across 6,650+ restaurant locations.
  • Built scalable data processing pipelines using Azure Databricks and PySpark to perform large-scale text analysis including frequency analysis, n-grams (bi/tri-grams), sentiment analysis, and word associations/clustering.
  • Performed multi-dimensional slice analysis based on Month, SKU, Complaint Code, Region, and Province to identify key business trends and customer issues.
  • Implemented aspect-level and phrase-level sentiment analysis to extract granular insights from reviews and classify customer feedback as positive or negative.
  • Developed machine learning models to predict user ratings by correlating sentiment scores, extracted phrases, and historical 5-star ratings.
  • Enabled data-driven decision-making by providing actionable insights into customer satisfaction, product performance, and regional trends.

Education

Bachelor of Computer Science & Engineering - Computer Sceience

Pune University
2001.04 - /2016

Skills

  • Cloud Platform: Microsoft Azure
  • Data Engineering Tools: Azure Data Factory, Azure Databricks (Unity Catalog), Microsoft Fabric, Azure Logic Apps
  • Big Data Processing: Apache Spark, PySpark, Spark SQL
  • Workflow Orchestration: Azure Data Factory (ADF)
  • Programming Languages: Python, SQL
  • Data Storage & Warehousing: Azure Data Lake Storage Gen2 (ADLS Gen2), Azure Synapse Analytics, Delta Lake
  • Data Processing & Transformation: ETL/ELT Pipelines, Data Modeling, Data Quality, Performance Optimization
  • Big Data Ecosystem: Distributed Data Processing, Lakehouse Architecture, Data Warehousing
  • Reporting & Visualization: Power BI
  • Version Control & CI/CD: Git, CI/CD Pipelines, Azure DevOps

Certification

Databricks Certified Data Engineer Professional (Expires on January 10, 2028 )
Microsoft Certified Fabric Data Engineer Associate - DP 700 (Expires on April 4, 2027)

Awards, Accomplishments, and Honors

Star Award – Recognized for taking initiative and successfully contributing to large-scale cloud migration projects at CGI.
Strive for Excellence & Customer Focus Awards – Honored at Michelin for outstanding performance in optimizing critical data processes and delivering high-quality solutions.
You’re a Gem Award – Awarded for designing and delivering a complex data model and Power BI dashboard for a key client at Mindtrail.

Timeline

Lead Analyst

CGI
2020.10 - Current

Software Engineer

MINDTRAIL TECHNOLOGY
2017.01 - 2020.10

Bachelor of Computer Science & Engineering - Computer Sceience

Pune University
2001.04 - /2016
Paurnima Gadhave