Summary
Overview
Work History
Education
Skills
Timeline
Generic

Naga jyothi Yetukoori

Data Engineer
Hyderabad

Summary

Data Engineer with 5.4 years of experience in Azure technologies (Azure data
factory, Azure Databricks, PySpark, spark SQL) specializing in data quality enhancement and redundancy reduction (25%improvement).

Implemented spark optimizations. Implemented 20+ pipelines.

Created documentation for best practices at HCL Technologies.

Overview

5
5
years of professional experience
3
3
years of post-secondary education

Work History

Data Engineer

HCL Technologies
Hyderabad
10.2023 - Current

RCM Project:
In this project I have design and implement a scalable data
engineering pipeline that ingests, processes, and transforms
Electronic Medical Records (EMR) and Insurance Claims data
from various healthcare providers and individual
subscribers, groups The pipeline integrates clinical and
financial data into a Common Data Model (CDM), enabling
healthcare providers and insurers to conduct advanced
analytics and optimize financial reporting for claim process
Here are the data sources: Source 1 - Electronic Medical
Records(EMR)- Azure SQL DB which has various objects like
Patients, Providers (Doctors), Departments, Encounters, EMR
transactions.
Source 2 ??? Claims data as a CSV file coming from
insurance company as a daily feed. Source 3,4,5 ??? A API
call to fetch the diagnosis codes(ICD-10), NPI(National
provider identifier) codes of Providers and CPT(current
procedural terminology) codes Data from All sources are
ingested daily using Azure Data Factory (ADF) into a bronze
layer in Azure Data Lake Storage (ADLS) Now comes the role
of Azure Databricks for processing. The data pipeline is
divided into three key layers: 1)Bronze Layer (Raw Data):
Data are ingested into this layer without transformations
Delta Tables are created on top of it. 2) Silver Layer (Cleaned
and Standardized Data): key data quality checks are applied
for example - Invalid or missing fields are flagged and
rejected (we generally say this is quarantine, the invalid data
is quarantined) - Data deduplication is performed to remove
duplicate records. - transform Patients data to include SCD
Type 2 fields - Create a Common Data Model (CDM),
standardizing the schema and structure. 3)Gold Layer (Fact
and Dimension Tables): - the cleaned data is further
transformed into fact and dimension tables and then
advanced analytics can be performed on this. So if we
summarize, there are 3 different kind of sources - Azure SQL
DB - Flat CSV files - API's we ingest data from all these
sources using Azure Data Factory. This has to be done
incrementally. Then we follow the medallion architecture and
take the data forward using Azure Databricks through
various layers (Bronze, Silver, Gold) - SCD Type 2 can be
applied on Providers, Patients. - The frequency of pipeline
can be daily non peak hours so that day today transaction do
not get slowed down. - certain Batch pipelines runs Monthly
and weekly once. - while ingestion the target data can be
loaded in Parquet format. Technologies Used:- Azure
DataBricks, Azure DataFactory,Pyspark,SparkSQL,SQL.

Data Engineer

HCL Technologies
Hyderabad
10.2022 - 10.2023

Lending Club Project:

1. Orchestrated a Finance Domain project, employing PySpark and Azure Cloud, optimizing lending processes for new banks and achieving a 20% reduction in processing time.

2. Engineered high-performance Spark code for ETL, adhering to Agile methodologies, resulting in a 30% improvement in project efficiency.

3. Applied data cleaning techniques, resolving 95% of duplicates, null values, and datatype issues, ensuring data integrity for downstream teams.

4. Implemented Slowly Changing Dimensions (SCD) strategies, handling updates seamlessly and enhancing data accuracy by 25%.

5. Directed deployment across environments with a 40% reduction in deployment time,contributing to a successful project outcome.

Associate Technical Member Staff

HCL Technologies
Hyderabad
10.2019 - 10.2022

2019 October to 2022 October
Product Lifecycle Management Project:


1. Analyzing the Customer ECOs/Drawings closely and coordinate with the Site Process / Product Engineering team for evaluating and processing
the changes for an End product.
2.Managing the Engineering Change activity right from design of the product to implementation.
3.Review, Scrub and to compare the Bills of Material (BOMs) of new designs or design changes that needs to incorporate in ERP.
4.Responsible for controlling and maintaining the engineering change and documentation tracking process with emphasis on Bills of Material
(BOM's), specifications, drawings, and documentation to ensure appropriate changes are documented.
5.Maintaining the Approved Manufacturer List (AML) from customer BOM for the Parts.
6.Various Attributes of BOM (Item type/Item group, Customer Part No/Customer Part, Reference Location Quantity, Warehouse) are analyzed in
respect to the customer BOM’s.Communicating with internal/external clients to determine specific requirements and expectations, managing
client expectations as an indicator of quality.
7.Create and maintain master data related to Variant Configuration products structure depends upon customer request.
8.Identified and implemented efficiencies to drive continuous improvement in the execution of process.

Education

High School Diploma -

Nettur of Technical Training Center (NTTF)
Bengaluru, India
07.2016 - 10.2019

Skills

Azure Data Factory

Azure Data Lake

Azure Databricks

Azure Key vault

Bigdata Technologies

Github

Memory optimization

Performance optimization

SQL

Pyspark

Data Engineering

Data Transformation

Azure Synapse Analytics

Unity Catalog

Structured Streaming

Auto Loader

Delta Live Tables

Timeline

Data Engineer

HCL Technologies
10.2023 - Current

Data Engineer

HCL Technologies
10.2022 - 10.2023

Associate Technical Member Staff

HCL Technologies
10.2019 - 10.2022

High School Diploma -

Nettur of Technical Training Center (NTTF)
07.2016 - 10.2019
Naga jyothi YetukooriData Engineer