Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Shivangi Srivastava

Bengaluru

Summary

Accomplished Data Engineer with a proven track record at HashedIn By Deloitte, specializing in ETL processes, data modeling, and optimization using Python, Spark, and AWS technologies. Demonstrated ability to enhance data analysis efficiency and automate data discrepancies resolution, showcasing collaborative approach.

Overview

3
3
years of professional experience

Work History

Data Engineer -2

HashedIn By Deloitte
Bengaluru
12.2024 - Current
  • Developed an automated script to reduce manual effort in analyzing mismatches of data using PySpark for the use case of CDC whic replicate data from legacy systems to OpenSearch in real time.
  • Contributed to data modeling in OpenSearch by analyzing the queries used on the legacy side, and data profiling to understand the relationships between tables involved in OLTP.
  • Helped the team members to fix the issue for one of the PySpark scripts by optimizing the data fetch operation using a hashing column to leverage the distributed compute feature of Spark in order to meet the deadline without any blockers.
  • Did a data analysis and profiling to minimize the data discrepancy between legacy and cloud systems for both scheduled jobs and CDC changes.

Data Engineer

HashedIn By Deloitte
Bengaluru
08.2022 - 11.2024
  • Developed and optimized ETL Glue scripts which fetch the data from the data lake S3 and API, then transformed the data according to business rules, and stored the data in the Aurora PostgreSQL database.
  • Collaborated with the business to gain an understanding of needs from an analytics perspective, built the SQL queries to meet the those needs, and optimized them by breaking the SQL into CTEs, which enhanced the performance of the script.
  • Implemented SCD Type 3 for the Data Analytics team using DMS, which replicates the data in real time from RDBMS to the Data Warehouse (Redshift), and also engaged in data modeling activities to select appropriate sort keys and distribution keys.
  • Participated in POC activities, where I developed a script to find all predecessor and successor jobs using depth-first search, with Control-M XML as an input file, and optimized it using the memorization technique, which helped the mining team manually find predecessor and successor jobs.
  • Automated data correction minimizes the data discrepancy between legacy and cloud systems using an input file that contains all data mismatch information generated from an automated validation script, making it dynamic and reusable across all products.
  • Documented and implemented data ingestion rules for the CDC for a set of tables.

Education

B.TECH - CSE

Chandigarh University
Chandigarh
06-2022

Class 12 - Science

Delhi Public School
Prayagraj
05-2018

Skills

  • Programming language: Python
  • ETL Tool: Spark, AWS Glue
  • RDBMS: Aurora PostgreSQL, MSSQL
  • NoSQL databases: OpenSearch, DynamoDB
  • DataLake: AWS S3
  • Data Warehouse: AWS Redshift
  • AWS Services: Kinesis, CloudFormation, Lambda, SQS, SNS, IAM

Accomplishments

  • Rising Star Spot Award
  • Excellence Award
  • Squad of the Quarter
  • ProductExpo RunnerUp

Timeline

Data Engineer -2

HashedIn By Deloitte
12.2024 - Current

Data Engineer

HashedIn By Deloitte
08.2022 - 11.2024

B.TECH - CSE

Chandigarh University

Class 12 - Science

Delhi Public School
Shivangi Srivastava