Pradyumn Joshi

KPMG INDIA

Bengaluru

03.2022 - Current

I) Measurement 360 (Current project)

Domain – Banking
Functional objective – It’s a Tech Risk Engineering project to maintain compliance for internal Banking Controls. Here, Metric development (Spark Scala script) is done to auto-measure control (rules for control to be compliant) for different programs like CCM, Control Adoption, RAS, QSAT, RCSA, SOD, VM, and Control Performance, etc.
Skills/Tools - Spark Scala, PySpark, Python, SQL, Snowflake, JupyterLab, Gitlab, Jira.
Responsibilities –
1) Developing Spark Scala scripts, optimizing code, fixing bugs.
2) Developing SQL prototype, doing POC’s in PySpark / Pandas for any new client requirements.
3) Recertification of Metric Script logic.

II) ReD - Research and Development

Domain – Renewable Energy
Functional objective – Data Engineering on data coming from Solar plants, Hydroelectricity power plant and windmill to facilitate proper data to data science and analytics team for their ML models or dashboards, respectively.
Skills/Tools - Azure Data Factory, Databricks (PySpark/ Python), ADLS, Logic Apps.
Responsibilities –
1) To perform full ETL/ELT process to extract data from various sources like SharePoint, FTP, Email attachments to provide data to various teams as required.
2) Developing ADF pipelines, Databricks (Python / PySpark) notebooks for transformations, Logic app workflow to fetch data from email attachments and to trigger pipelines, ADLS Gen 2 used for storage.

III) Qlik to Azure Lake Implementation

Domain – Health Care (pathology)
Functional objective – The goal was to migrate to Azure to make efficient use of their data in Dashboarding (Reporting), ML models and to utilize scalable, on demand resources in Azure. I worked on Quality Check and Quality Audit Dashboard modules.
Skills/Tools - Azure Data Factory, Azure Synapse Analytics, SQL, Stored Procedures, ADLS, Databricks (PySpark).
Responsibilities –
1) Developing ADF pipelines to extract data from their On-premises SQL Servers. ADLS Gen 2 to store data in Raw and Curated (cleaned/partitioned) layers.
2) Understand Qlik code and convert to SQL Stored procedures for Transformation. Transformation was done via developing Stored Procedures in Azure Synapse Analytics to create Fact and Dimension tables. Dimension tables are created as External tables and Fact as Database tables.

IV) PR to Payments Procurement Report

Domain – Supply Chain Management
Functional objective – To create a procurement report which comprises of complete data for supply chain process that is PR, PO, RFQ, ASN, GRN, Vim and Payments. This involves RFQ data from open APIs of SAP Ariba and rest from SAP ECC/ SAP HANA.
Skills/Tools - Azure Data Factory, Databricks (Python), Azure Synapse Analytics, ADLS, Logic Apps.
Responsibilities –
1) Perform full ETL on Ariba Open APIs via Azure Databricks notebooks. There were around 11 API’s which belong to 3 different families of API. Each having a bearer token which expires in every 20 mins. The Data coming was Non – Relational Data. Then data load to ADLS.
2) Performed ETL on S4-HANA erstwhile SAP ECC, extraction through ADF via Azure Table connector, data load to ADLS.
3) These both data then collectively transformed in Synapse staging tables. Final Stored procedure run to get report data collectively in Table which then is used in Power BI.

Summary