Summary
Overview
Work History
Education
Skills
Languages
Certification
Accomplishments
Timeline
Generic

Pradyumn Joshi

Bengaluru

Summary

As a Data Engineer with 4 years of experience,I specialize in Big data and Azure Data Engineering. My Technical skills includes Python, PySpark, SQL, Spark Scala, Azure Data Factory, Azure Databricks, Azure Synapse, Azure Data Lake Storage, Azure logic apps and MS SQL Server including Stored Procedures. Throughout my career, I have successfully contributed to project across diverse sectors such as Banking, Renewable energy ,Procurement and Ore Mining.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Consultant - Data Engineer

KPMG INDIA
Bengaluru
03.2022 - Current

I) Measurement 360 (Current project)

  • Domain – Banking
  • Functional objective – It’s a Tech Risk Engineering project to maintain compliance for internal Banking Controls. Here, Metric development (Spark Scala script) is done to auto-measure control (rules for control to be compliant) for different programs like CCM, Control Adoption, RAS, QSAT, RCSA, SOD, VM, and Control Performance, etc.
  • Skills/Tools - Spark Scala, PySpark, Python, SQL, Snowflake, JupyterLab, Gitlab, Jira.
  • Responsibilities –
  • 1) Developing Spark Scala scripts, optimizing code, fixing bugs.
  • 2) Developing SQL prototype, doing POC’s in PySpark / Pandas for any new client requirements.
  • 3) Recertification of Metric Script logic.

II) ReD - Research and Development

  • Domain – Renewable Energy
  • Functional objective – Data Engineering on data coming from Solar plants, Hydroelectricity power plant and windmill to facilitate proper data to data science and analytics team for their ML models or dashboards, respectively.
  • Skills/Tools - Azure Data Factory, Databricks (PySpark/ Python), ADLS, Logic Apps.
  • Responsibilities –
  • 1) To perform full ETL/ELT process to extract data from various sources like SharePoint, FTP, Email attachments to provide data to various teams as required.
  • 2) Developing ADF pipelines, Databricks (Python / PySpark) notebooks for transformations, Logic app workflow to fetch data from email attachments and to trigger pipelines, ADLS Gen 2 used for storage.

III) Qlik to Azure Lake Implementation

  • Domain – Health Care (pathology)
  • Functional objective – The goal was to migrate to Azure to make efficient use of their data in Dashboarding (Reporting), ML models and to utilize scalable, on demand resources in Azure. I worked on Quality Check and Quality Audit Dashboard modules.
  • Skills/Tools - Azure Data Factory, Azure Synapse Analytics, SQL, Stored Procedures, ADLS, Databricks (PySpark).
  • Responsibilities –
  • 1) Developing ADF pipelines to extract data from their On-premises SQL Servers. ADLS Gen 2 to store data in Raw and Curated (cleaned/partitioned) layers.
  • 2) Understand Qlik code and convert to SQL Stored procedures for Transformation. Transformation was done via developing Stored Procedures in Azure Synapse Analytics to create Fact and Dimension tables. Dimension tables are created as External tables and Fact as Database tables.

IV) PR to Payments Procurement Report

  • Domain – Supply Chain Management
  • Functional objective – To create a procurement report which comprises of complete data for supply chain process that is PR, PO, RFQ, ASN, GRN, Vim and Payments. This involves RFQ data from open APIs of SAP Ariba and rest from SAP ECC/ SAP HANA.
  • Skills/Tools - Azure Data Factory, Databricks (Python), Azure Synapse Analytics, ADLS, Logic Apps.
  • Responsibilities –
  • 1) Perform full ETL on Ariba Open APIs via Azure Databricks notebooks. There were around 11 API’s which belong to 3 different families of API. Each having a bearer token which expires in every 20 mins. The Data coming was Non – Relational Data. Then data load to ADLS.
  • 2) Performed ETL on S4-HANA erstwhile SAP ECC, extraction through ADF via Azure Table connector, data load to ADLS.
  • 3) These both data then collectively transformed in Synapse staging tables. Final Stored procedure run to get report data collectively in Table which then is used in Power BI.

Trainee Programmer - Data Engineer

YASH Technologies
Indore
10.2020 - 03.2022

I) Data Engineering for Mining Clients

  • Domain – Ore Mining.
  • Functional objective – To analyze different datasets for data from various Coal, Diamond, Iron, Aluminum and platinum mines.
  • Skills/Tools: Azure Data Factory, Synapse, Azure Databricks Notebooks (Python, PySpark, and SQL).
  • Responsibilities –
  • Developing ADF pipelines, Databricks notebooks to create staging tables, and stored procedures in Azure Synapse Analytics to create fact and dimension tables.
  • 2.ADLS Gen2 is used to store raw data from SharePoint Raw layer in Avro format and parquet format in Intermediate layer.
  • 3.Writing logic apps for creating workflows for running ADF pipelines, alert emails, AAS refresh.

Education

Bachelor of Engineering - Computer Science

Swami Vivekanand College of Engineering (RGPV)
Indore , India
09-2020

Skills

  • Database: SQL Server, MySQL, Snowflake
  • ETL: Azure Data Factory, Databricks, Synapse , Logic Apps, ADLS, JupyterLab
  • Programming Language: Python, Spark Scala, PySpark,SQL

Languages

Hindi
First Language
English
Proficient (C2)
C2

Certification

  • Microsoft certified Azure Data Engineer Associate (Dec 2021).
  • Databricks certified Data Engineer Associate (June 2023).
  • Databricks certified Spark Developed Associate (Aug 2023).
  • Microsoft certified Azure Fundamentals (Oct 2021).
  • Microsoft certified Azure Fundamentals (Aug 2021 ).

Accomplishments

  • Kudos Award from KPMG India

Timeline

Consultant - Data Engineer

KPMG INDIA
03.2022 - Current

Trainee Programmer - Data Engineer

YASH Technologies
10.2020 - 03.2022

Bachelor of Engineering - Computer Science

Swami Vivekanand College of Engineering (RGPV)
Pradyumn Joshi