Shivangi Srivastava

Bengaluru

Summary

Accomplished Data Engineer with a proven track record at HashedIn By Deloitte, specializing in ETL processes, data modeling, and optimization using Python, Spark, and AWS technologies. Demonstrated ability to enhance data analysis efficiency and automate data discrepancies resolution, showcasing collaborative approach.

Overview

years of professional experience

Work History

Data Engineer -2

HashedIn By Deloitte

Bengaluru

12.2024 - Current

Developed an automated script to reduce manual effort in analyzing mismatches of data using PySpark for the use case of CDC whic replicate data from legacy systems to OpenSearch in real time.
Contributed to data modeling in OpenSearch by analyzing the queries used on the legacy side, and data profiling to understand the relationships between tables involved in OLTP.
Helped the team members to fix the issue for one of the PySpark scripts by optimizing the data fetch operation using a hashing column to leverage the distributed compute feature of Spark in order to meet the deadline without any blockers.
Did a data analysis and profiling to minimize the data discrepancy between legacy and cloud systems for both scheduled jobs and CDC changes.

Data Engineer

HashedIn By Deloitte

Bengaluru

08.2022 - 11.2024

Developed and optimized ETL Glue scripts which fetch the data from the data lake S3 and API, then transformed the data according to business rules, and stored the data in the Aurora PostgreSQL database.
Collaborated with the business to gain an understanding of needs from an analytics perspective, built the SQL queries to meet the those needs, and optimized them by breaking the SQL into CTEs, which enhanced the performance of the script.
Implemented SCD Type 3 for the Data Analytics team using DMS, which replicates the data in real time from RDBMS to the Data Warehouse (Redshift), and also engaged in data modeling activities to select appropriate sort keys and distribution keys.
Participated in POC activities, where I developed a script to find all predecessor and successor jobs using depth-first search, with Control-M XML as an input file, and optimized it using the memorization technique, which helped the mining team manually find predecessor and successor jobs.
Automated data correction minimizes the data discrepancy between legacy and cloud systems using an input file that contains all data mismatch information generated from an automated validation script, making it dynamic and reusable across all products.
Documented and implemented data ingestion rules for the CDC for a set of tables.

Education

B.TECH - CSE

Chandigarh University

Chandigarh

06-2022

Class 12 - Science

Delhi Public School

Prayagraj

05-2018

Skills

Programming language: Python
ETL Tool: Spark, AWS Glue
RDBMS: Aurora PostgreSQL, MSSQL
NoSQL databases: OpenSearch, DynamoDB

DataLake: AWS S3
Data Warehouse: AWS Redshift
AWS Services: Kinesis, CloudFormation, Lambda, SQS, SNS, IAM

Accomplishments

Rising Star Spot Award
Excellence Award
Squad of the Quarter
ProductExpo RunnerUp

Timeline

Data Engineer -2

HashedIn By Deloitte

12.2024 - Current

Data Engineer

HashedIn By Deloitte

08.2022 - 11.2024

B.TECH - CSE

Chandigarh University

Class 12 - Science

Delhi Public School