Summary
Overview
Work History
Education
Skills
Certification
Professional Experience
Project Experience
Accomplishments
Timeline
Generic

Deepak Ghule

Pune

Summary

Experienced IT professional with 14+ years in Azure Data Engineering and Analytics specializing in building scalable data platforms leveraging Azure Databricks, Delta Lake, and native Azure Services. Proficient in data ingestion, transformation, and analytics using PySpark, Azure Data Factory (ADF), and Azure Synapse Analytics. Skilled in ETL and ELT Architecture, Delta Lake Medallion Architecture (Bronze, Silver, Gold layers), incremental loading strategies, and performance optimization with advanced PySpark transformations. Adept at developing interactive Power BI reports, ensuring secure data processing with Azure Key Vault, Managed Identities, and Azure AD. Strong expertise in Database design, Dimensional Data Modelling, and Workflow orchestration, enabling actionable insights through secure, reliable, and efficient data solutions.

Overview

15
15
years of professional experience
1
1
Certification

Work History

Data Engineer

Mesmerise Group
12.2022 - 09.2024

Data Engineer at Mesmerise Group UK through Globalization HR Solutions India Private Limited

System Analyst

Cybage Software Pvt. Ltd.
11.2009 - 12.2022

Education

Master in computer science -

Pune University
06-2005

Skills

  • Microsoft Cloud: Azure Databricks with PySpark, Synapse Analytics, Azure SQL Database, Azure Data Factory, Azure Data Lake and Delta Lake, Event Hub, Stream Services, Key-Vault, Managed Identities, Azure Active Directory, Azure Virtual Machine etc
  • Artificial intelligence: LLM Basics, Knowledge Graphs, Graph database (Neo4J), Document AI intelligence, AI search services
  • Business Intelligence: MSBI – SSIS (ETL), SSRS, Dimensional Data Modelling, Incremental loading, SCD Types implementations
  • Data Visualization: Power BI, DAX
  • Database: MS-SQL, T-SQL
  • Programming Language: PySpark, Python, Net Core & Net Framework (NET 45/40/35)
  • Tools: Jira, Azure DevOps, Miro, Slack, Teams, Confluence, Git, Jupyter notebooks, TFS, SVN
  • Methodologies: Agile, Scrum

  • Programming Languages: Python, SQL, Net Core & Net Framework (NET 45/40/35)
  • Tools: Jira, Azure DevOps, Miro, Slack, Teams, Confluence, Git, Jupyter notebooks, TFS, SVN
  • Reporting Tools: Power BI reports
  • Methodologies: Agile, Scrum

Certification

  • Databricks Certified Data Engineer Associate
  • DP-900 - Data Fundamentals, https://learn.microsoft.com/en-us/users/deepakghule-8270/credentials/bb5123ee953c9cb
  • AZ-900 - Azure Fundamentals, https://learn.microsoft.com/en-us/users/deepakghule-8270/credentials/3b301dc4732e5a0b

Professional Experience

Azure Data Engineering with Databricks Experience:

  • Proficient in building data pipelines and workflows in Azure Databricks with PySpark for large-scale data processing, transformation, and analytics.
  • Designed and Implemented Delta Lakehouse solutions (Bronze, Silver, Gold layers) for raw, cleansed, and curated efficient data storage, versioning and ACID transactions in Azure, following Medallion Architecture.
  • Developed PySpark scripts to ingest data from multiple sources like External APIs, on-premises server, cloud storage (ADLS).
  • Applied advanced PySpark transformations for cleansing, standardization, and enrichment, including handling nulls, currency

and timestamp conversions.

  • Perform complex joins, aggregations, and group-by operations to combine datasets and derive meaningful insights.
  • Implemented efficient Incremental loading strategies in Delta Lakehouse using Change Data Capture (CDC) technique to identify changes in data sources (e.g., new or modified records) and apply incremental updates to target datasets, reducing processing time.
  • Utilized Delta Lake's MERGE INTO statement for update, insert, or delete operations based on primary key matching and change detection, and added audit columns (e.g., created_at, updated_at, record_status) to track changes and ensure data integrity throughout the incremental load process.
  • Designed reusable PySpark functions for data validation, converting currencies, timestamp conversions, de-duplication, schema enforcement, and error handling.
  • Orchestrated workflows using Azure Data Factory, integrating Databricks notebooks for end-to-end data processing.
  • Automate job execution and implement monitoring and alerting for job failures or data inconsistencies.
  • Optimize storage and compute costs by choosing appropriate cluster configurations (e.g., auto-scaling, spot instances).
  • Debug and optimize PySpark jobs by applying techniques such as predicate pushdown, broadcast joins, caching, and efficient shuffling strategies.


Azure Data Engineering with Native Services Experience:

  • Designed and implemented ETL Architectures to handle data ingestion, transformation, storage, and analytics using Azure native services, ensuring a seamless integration with downstream systems.
  • Built end-to-end data platform solutions, utilizing Azure services including Azure Event Hub and Stream Analytics, Data Factory, Synapse Analytics, Data Lake Storage (ADLS), and Azure key-Vault, Azure Active Directory (Azure AD).
  • Integrated Unity Application APIs (handled by another team) for event data ingestion, enabling real-time data streaming through Azure Event Hub and processing via Azure Stream Analytics.
  • Utilized Azure Data Lake Storage (ADLS) for structured data storage, facilitating efficient downstream data processing.
  • Developed Azure Data Factory (ADF) pipelines to extract data from ADLS, load into staging tables, apply business transformations, and implement incremental loading strategies using Slowly Changing Dimensions by comparing source and target data.
  • Transformed and loaded data into Azure Synapse Analytics (Data Warehouse) for analytics and reporting.
  • Configured Azure Active Directory (Azure AD) by creating and managing security groups, adding users, assigning role-based access control (RBAC) permissions, and creating service principals to enable secure and scalable identity and access management.
  • Integrated Azure Key Vault and Managed Identity to securely manage sensitive data, such as credentials and access keys, ensuring compliance and data protection.
  • Automated monitoring and alerting for data pipelines to promptly address failures and maintain system reliability.



Database Design and Modelling Experience:

  • Developed and implemented dimension modelling techniques, designing fact and dimension tables to optimize query performance and adhere to data warehousing best practices.
  • Designed database objects including schemas, tables, stored procedures, views, and indexes to meet analytical and operational data requirements.
  • Ensured data availability and consistency by implementing robust incremental load strategies and managing Slowly Changing Dimensions (SCD Type 2) to track historical changes.
  • Created optimized database views to support analytical reporting requirements in Power BI.



Power BI Development and Analytics Experience:

  • Designed and developed interactive Power BI dashboards and reports, optimizing performance with Direct Query and Import modes to meet client needs.
  • Integrated Power BI with Azure Synapse and Azure SQL Database to streamline reporting capabilities.
  • Developed measures and calculated columns using DAX for business logic, time intelligence, and conditional calculations.
  • Published reports and dashboards to Power BI Service, ensuring timely updates with scheduled data refreshes.
  • Delivered Power BI embedding solutions to integrate reports and dashboards into custom applications, improving accessibility.
  • Designed Data Validation Reports to track anomalies and maintain data accuracy with control flow charts and real-time tracking.
  • Conducted end-user training on Power BI features, report customization, and best practices for self-service analytics.



Full Stack Development Experience:

  • Developed user-friendly and responsive web interfaces using HTML, CSS, JavaScript.
  • Built robust APIs and web services using ASP.NET or ASP.NET Core.
  • Designed and implemented server-side logic, including business workflows and application integrations.
  • Optimized application performance, security, and scalability.
  • Implemented authentication and authorization mechanisms to secure the backend.
  • Designed and SQL Server databases, Written optimized SQL queries, stored procedures, and triggers to support application functionality.
  • Ensured data integrity and managed database performance tuning.
  • Troubleshoot and resolve database-related issues.


Collaboration and Documentation:

  • Collaborate with data architects, analysts, and business stakeholders and cross-functional teams to understand requirements and deliver tailored solutions.
  • Document pipelines, data workflows, and transformation logic for knowledge sharing and reproducibility.
  • Provided mentorship and technical guidance to junior engineers, fostering their development and increasing team productivity.
  • Stayed updated with emerging Azure technologies, evaluating and integrating new features to improve the data ecosystem.




Project Experience

  • PROJECT #1:

Client Name: Carbonaires

Project Name: Carbonaires Data Platform

Duration: (Aug 2023 – Sept 2024)

Role: Data Engineer

Technologies Used: Azure Databricks, PySpark, Delta Lake, Azure SQL, Azure Data factory(ADF), ETL, Data Lake (ADLS), Azure Key-Vault, Managed Identity, Azure AD, Azure DevOps (git & CICD), Power BI.

Description: Carbonaires is an ESG-driven carbon asset management company that offers investors exposure to carbon credits, essential for achieving carbon neutrality and net-zero goals. Their business model revolves around partnering with carbon credit project developers, providing upfront funding in exchange for a share of future carbon credits produced. Carbonaires sell these carbon credits to corporations seeking to offset their emissions, optimizing value by purchasing high-integrity credits at low prices and selling them at higher rates. To enhance transparency and project valuation, they aim to collaborate with Mesmerise to build a data platform for improved valuation and risk management.


  • PROJECT #2:

Client Name: Mesmerise Group

Project Name: Gatherings Data Platform & Analytics

Duration: (Nov 2022 – Aug 2023)

Role: Data Engineer and Analyst

Technologies Used:  Azure Event Hub, Stream Analytics, Azure SQL, Azure Data factory(ADF), ETL, Azure Databricks, PySpark, Data Lake (ADLS), Azure Key Vault, Managed Identity, Azure AD, Azure DevOps (git & CICD), Power BI

Description: The Gatherings VR Application, developed by Mesmerise Group, is a virtual reality (VR) platform that enables people to meet and interact in an immersive virtual environment. It serves as a digital space where users can gather for virtual meetings, conferences, or social events, providing an engaging alternative to traditional video calls. Through this platform, participants can experience a sense of presence and connection, enhancing collaboration and interaction in a way that mimics real-life gatherings. To support this application, the Gatherings VR Data Platform project focuses on data ingestion, transformation, and analytics, enabling insights into user engagement, platform performance, and application usage patterns. As a Data Engineer, my role involves building a scalable data infrastructure that captures and processes both real-time and historical data generated within the VR environment.


  • PROJECT #3:

Client Name: Maritzcx

Project : Customer Experience Platform.

Duration : May 2018 – Nov 2022

Technologies Used : SQL server, Data Lake (ADLS), Azure Key-Vault, Azure Data factory(ADF), Azure Databricks, PySpark, Power BI

Description: This CX platform is a customer feedback management system. With this application, customers can create the survey and send the invitations, so that the end user can take the survey, and feedback is collected. Platform allows customers to generate and schedule the various pre-defined reports based on the collected data. Some customers have custom requirements around the platform features. It needs to develop the custom applications to get that customers requirement like custom invitations, custom reports export and survey extensibility etc.


  • PROJECT #4:

Project : Data Feeds

Duration : June 2014 – April 2018

Technologies Used : C#, ASP.NET, SQL, SSIS

Description: GroupM has developed a tool called Data Feeds. It's a collection of different feeds which we have developed to consume data from different vendors like (Facebook, Twitter, Google Analytics, and some of internal data within GroupM). And this data is then used by other tools of GroupM like Datamart and data marketplace. Each feed is a separate implementation and separate library.


  • PROJECT #5:

Project : iSERAS (iProspect Search Engine Result Analysis System)

Duration : Jan 2013 - May 2014

Technologies Used : C#.net, MVC3.0 (razor), JQuery, SQL Server

Description: iSERAS is the Search engine result analysis system. This tool is used by the SEO analyst team that helps end clients to improve their ranking and set their goals for internet Advertising. This application consists of two main parts as a Web reporting UI & background data collection application. The data collector application collects ranking data from different sources like Google and Bing API. This application also collects PPC data from Google Adwords and MSN AdCenter API. The reporting UI part consists of a number of reports for Ranking and PPC data. These advanced reports (such as Pie charts, Bar charts, multiple axis graphs etc.) are generated using Telerik / Kendo Controls.


  • PROJECT #6:

Project : IRC (iProspect Ranking Collector)

Duration : Jan 2010 – Dec 2012

Technologies Used : C#.net, MVC3.0 (razor), JQuery, SQL Server

Description: iRC system collects rankings from Google, Yahoo, Bing, AOL, Ask and YouTube search engines for various markets. The system records organic/sponsored rankings for given keywords against mentioned domains URL. This system also provides the Ranking report, Sponsored analysis report, competitors ranking report, non-defined competitor reports. This system collected top 30 rankings for given keywords for each client specified search engine. iRC system is divided into 3 main modules as Admin interface, Data Collector, UI reporting.

Accomplishments

  • Runner up in SQL challenge, organisation level event at Cybage Software Pvt. Ltd. Pune
  • Got appreciation from customer for quick implementation of the SSO

Timeline

Data Engineer

Mesmerise Group
12.2022 - 09.2024

System Analyst

Cybage Software Pvt. Ltd.
11.2009 - 12.2022

Master in computer science -

Pune University
Deepak Ghule