Summary
Overview
Work History
Education
Skills
Timeline
Hi, I’m

Shiwangi Bhatia

Gurgaon
Shiwangi Bhatia

Summary

Results-driven Senior Data Engineer with over 9 years of expertise in developing data-driven solutions that enhance efficiency and accuracy while delivering actionable insights. Proficient in leveraging Big Data technologies such as Apache Spark, Spark Streaming Kafka , streamsets , Apache NiFi, S3 , AWS Glue , Athena , Iceberg , and NoSQL databases including MongoDB, Cassandra to address complex business challenges and customer needs . Strong blend of functional and technical knowledge, complemented by hands-on experience with cloud technologies like AWS. Committed to implementing innovative data solutions that drive strategic decision-making and optimize organizational performance.

Overview

9
years of professional experience

Work History

Natwest Group

AWS Data Engineer
08.2023 - Current

Job overview

  • Architected and led the development of a real-time Customer Data Platform (CDP) to ingest unstructured behavioral events (IN, UP, DL) using schema-on-read processing and dynamic routing, enabling scalable personalization and faster onboarding of new data sources.
  • Designed and implemented stream-stream joins in PySpark and Apache Kafka to enrich customer profiles with transactional context for fraud detection, real-time segmentation, and next-best-action modeling.
  • Integrated MongoDB Change Data Capture (CDC) using Kafka Connect to stream live updates across both analytical and operational systems, ensuring synchronized, up-to-date data pipelines.
  • Applied TTL indexing in MongoDB to automatically expire transient data, reducing storage footprint and ensuring GDPR compliance for time-bound customer data retention.
  • Automated schema registration and governance for AVRO and JSON formats with built-in versioning, compatibility validation, and enforcement of data standards, streamlining onboarding and reducing schema drift.
  • Delivered high-volume curated datasets (>45M daily transactions) to analytics consumers through:
    Amazon S3 (with Iceberg for data lake optimization)
    OpenSearch for real-time insights
    Glue Data Catalog for metadata management and query federation
  • Implemented enterprise-grade data governance frameworks (OBDI, OBDQ, OBDC) to:
    Standardize data ingestion
    Validate and monitor data quality
    Enforce secure, auditable access across domains
  • Enabled self-service data discovery through:
    Data lineage tracking
    Integration with Glue Data Catalog, reducing dependency on engineering teams
  • Built an Athena query layer on top of S3, using partitioned and cataloged data to support QuickSight dashboards and business-critical reporting.
  • Partnered with Data Science and Marketing teams to operationalize data products, delivering actionable insights aligned with Salesforce Bedrock principles for scalable, customer-first data architecture.

American Express

Senior Data Engineer
05.2021 - Current

Job overview

Regulatory Risk Reporting Framework (CECL/ IFRS9 / CCAR)


  • Developed Regulatory Risk Reporting Framework for calculation of ECL for credit risk provisioning for CECL/IFRS data using front Book , Backbook data , National/State Macros data .
  • Maintaining and optimizing ETL/ELT pipelines in Python/Hive/PySpark to feed data into the cornerstone
  • Depicting ECL drift due to Model change / Macro change in Normal/Adverse Economic Conditions like Recessionary Period , Covid Economic Scenarios, Macro changes .
  • Analyzed large datasets to identify trends and patterns in model behaviors.
  • Implementing automated BNC checks and Data quality controls to ensure smooth PWC auditing and quality of data Ensured effective incident handling and resolution within SLA
  • Documentation of Model execution Controls through SOX documentation , PWC Documentation .
  • Responsible for generating aggregated reports and Tableau and Datasets for Data Science team


Nagarro Software

Big Data Lead / Developer (Telcom + Retail Domain)
11.2020 - 05.2021

Job overview


IDP AWS Platform


  • Managed 3 cross functional teams to work in close collaboration with analytics , engineering , stakeholders , Product Owners , led team of 4 to deliver easy to customize Big data solution for targeting multiple end clients
  • Developed generic framework that provides single repository/data lake solution for 360 view of Customer Data.
    Developed end-to-end infrastructure deployment /provisioning that
    includes provisioning lambdas, step functions, VPC, security groups, IAM roles, etc.
  • Apache NIFI and AWS DMS for Source Based CDC , Target Based CDC for insert , update and merge operations of Data Manipulation
  • Wrote business rules for spark job using Drool’s rule engine.
  • Worked on deployment module for deploying different lambdas definition on AWS like creation of EMR cluster, inserting transform data into Redshift, populating DynamoDB table entries using S3 event notification on lambda, etc.
  • Development of configuration-based spark modules for performing read,write, transform, and configuring flow of ETL pipeline using JSON configuration.
  • Orchestrations of spark Jobs to be submitted on Livy.
  • Wrote different step functions to orchestrate different lambdas and cloud watch event rules for scheduling step-functions timely.


Accenture

Hadoop Developer (Banking Domain)
03.2019 - 08.2019

Job overview

Athena Replatforming Project


  • Established data ingestion, transformation and profiling data pipeline in AWS using AWS S3, EMR, GLUE, ATHENA which increased execution efficiency and reduced project cost
  • Involved in technology migration from SAS to Python and Spark .
  • Creation of DVF scripts to validate SAS and Spark Datasets in Python
  • Implementation of clustered tables in Pyspark using Indexing and optimization of underlying datasets in Hive


TCS

Hadoop Developer (Retail + Banking)
06.2016 - 02.2019

Job overview


Teradata to Hadoop Migration


  • Built Hive scripts and did reformation with actual data Using Pig and finally inserted data back to Hive tables with dynamic partitioning.
  • Built Sqoop for transferring of data from hive tables to Teradata and vice-versa.
  • Used special ORC Format tables with UTF-8 encoding so that all special character are handled perfectly even in Hive tables.
  • Converting JDBC code written for Teradata to HBASE Java API code and implementing filter classes in java and comparison of results using Soap UI

Education

JMIT

B.Tech from Electrical, Electronics And Communications Engineering

University Overview

Skills

Pyspark

Timeline

AWS Data Engineer
Natwest Group
08.2023 - Current
Senior Data Engineer
American Express
05.2021 - Current
Big Data Lead / Developer (Telcom + Retail Domain)
Nagarro Software
11.2020 - 05.2021
Hadoop Developer (Banking Domain)
Accenture
03.2019 - 08.2019
Hadoop Developer (Retail + Banking)
TCS
06.2016 - 02.2019
JMIT
B.Tech from Electrical, Electronics And Communications Engineering
Shiwangi Bhatia