Summary
Overview
Work History
Education
Skills
Timeline
Generic

Shiwangi Bhatia

Gurgaon

Summary

Results-driven Senior Data Engineer with over 9 years of expertise in developing data-driven solutions that enhance efficiency and accuracy while delivering actionable insights. Proficient in leveraging Big Data technologies such as Apache Spark, Spark Streaming Kafka , streamsets , Apache NiFi, S3 , AWS Glue , Athena , Iceberg , and NoSQL databases including MongoDB, Cassandra to address complex business challenges and customer needs . Strong blend of functional and technical knowledge, complemented by hands-on experience with cloud technologies like AWS. Committed to implementing innovative data solutions that drive strategic decision-making and optimize organizational performance.

Overview

10
10
years of professional experience

Work History

Associate Vice President , Lead Data Engineer (IC)

Natwest Group
08.2023 - Current

Buisness Footprints :

  • Designed and developed end-to-end data pipelines within the cDNA Events Architecture, ingesting raw events from 14+ publisher systems (K423 SaaS, AEP, CEP, CES, Hermes Platform, Amazon Connect, Content Rails, CDC-CDB, eOBAO, mPlatform, SAO, CORA, RTC, Wealth FDS) into Kafka Raw Event Topics and persistance OIDS MongoDB , enabling downstream consumption by multiple enterprise consumers including Dynamics CRM, ESG Data Lake, ARIC (Fraud), Wealth FDS, Complaints Auto Investigation, CITA, and C&I.
  • ETL for 423 Climate Footprint domain workflows, including SMB climate events, transition plans, benchmark footprint updates, transformations for customer identity resolution, enabling efficient handling of BIN/CISID enrichment, legal entity mapping for downstream consumers

Storage Footprints :

  • Schema Drift manage for AVRO and JSON formats with built-in versioning, compatibility validation, and enforcement of data standards, streamlining onboarding and reducing schema drift.
  • Applied TTL indexing in MongoDB to automatically expire transient data, for time-bound customer data retention
  • Delivered high-volume curated datasets (>45M daily transactions) to analytics consumers through: Amazon S3 (with Iceberg for data lake optimization) OpenSearch for real-time insights Glue Data Catalog for metadata management and query federation

Compute / ETL / Data Pipelines :

  • Built an Athena query layer on top of S3, using partitioned and cataloged data to support QuickSight dashboards and business-critical reporting.
  • Migration of Legacy Pipelines to Sagemaker so that data can be leveraged for data Marketplace in AWS bedrock downstream model execution
  • Architected and led the development of a real-time Customer Data Platform (CDP) to ingest unstructured events (IN, UP, DL) using schema-on-read processing, and faster onboarding of new data sources to Data Lake and Data Marketplace cosumption patterns
  • Designed and implemented stream-stream joins in PySpark and Apache Kafka to enrich customer profiles with transactional context for fraud detection, real-time segmentation, and next-best-action modeling.
  • Integrated MongoDB Change Data Capture (CDC) using Kafka Connect to stream live updates across both analytical and operational systems, ensuring synchronized, up-to-date data pipelines.

Governance :

  • Implemented enterprise-grade data governance frameworks (OBDI, OBDQ, OBDC) to enforce secure, auditable access across domains

Senior Data Engineer

American Express
05.2021 - Current

Regulatory Risk Reporting Framework (CECL/ IFRS9 / CCAR)

  • Developed Regulatory Risk Reporting Framework for calculation of ECL for credit risk provisioning for CECL/IFRS data using front Book , Backbook data , National/State Macros data .
  • Maintaining and optimizing ETL/ELT pipelines in PySpark to feed data into the cornerstone
  • Depicting ECL drift due to Model change / Macro change in Normal/Adverse Economic Conditions like Recessionary Period , Covid Economic Scenarios, Macro changes .
  • Analyzed large datasets to identify trends and patterns in model behaviors.
  • Implementing automated BNC checks and Data quality controls to ensure smooth PWC auditing and quality of data Ensured effective incident handling and resolution within SLA
  • Documentation of Model execution Controls through SOX documentation , PWC Documentation .
  • Responsible for generating aggregated reports and Tableau and Datasets for Data Science team

Big Data Lead / Developer (Telcom + Retail Domain)

Nagarro Software
11.2020 - 05.2021

IDP AWS Platform

  • Managed 3 cross functional teams to work in close collaboration with analytics , engineering , stakeholders , Product Owners , led team of 4 to deliver easy to customize Big data solution for targeting multiple end clients
  • Developed generic framework that provides single repository/data lake solution for 360 view of Customer Data.
    Developed end-to-end infrastructure deployment /provisioning that
    includes provisioning lambdas, step functions, VPC, security groups, IAM roles, etc and configuration-based spark modules for performing read,write, transform, and configuring flow of ETL pipeline using JSON configuration.
  • Apache NIFI and AWS DMS for Source Based CDC , Target Based CDC for insert , update and merge operations of Data Manipulation
  • Wrote business rules for spark job using Drool’s rule engine.
  • Worked on deployment module for deploying different lambdas definition on AWS like creation of EMR cluster, inserting transform data into Redshift, populating DynamoDB table entries using S3 event notification on lambda, etc.
  • Orchestrations of spark Jobs to be submitted on Livy and Wrote step functions to orchestrate different lambdas and cloud watch event rules for scheduling step-functions timely.

Hadoop Developer (Banking Domain)

Accenture
03.2019 - 08.2019

Athena Replatforming Project

  • Established data ingestion, transformation and profiling data pipeline in AWS using AWS S3, EMR, GLUE, ATHENA which increased execution efficiency and reduced project cost
  • Involved in technology migration from SAS to Python and Spark .
  • Creation of DVF scripts to validate SAS and Spark Datasets in Python

Hadoop Developer (Retail + Banking)

TCS
06.2016 - 02.2019

Teradata to Hadoop Migration

  • Built Hive scripts and did reformation with actual data Using Pig and finally inserted data back to Hive tables with dynamic partitioning and parquet format hive tables for serialisation and handling of sepcia chracters
  • Built Sqoop for transferring of data from hive tables to Teradata and vice-versa.
  • Converting JDBC code written for Teradata to HBASE Java API code and implementing filter classes in java and comparison of results using Soap UI

Education

B.Tech - Electrical, Electronics And Communications Engineering

JMIT
Yamunanagar

Skills

  • Spark Streaming/Kafka
  • RAG storage in Mongo, Summarisation
  • Data Marketplace/Data Lake
  • AWS SageMaker Studio
  • Snowflake
  • MongoDB
  • SQL
  • Data analysis

Timeline

Associate Vice President , Lead Data Engineer (IC)

Natwest Group
08.2023 - Current

Senior Data Engineer

American Express
05.2021 - Current

Big Data Lead / Developer (Telcom + Retail Domain)

Nagarro Software
11.2020 - 05.2021

Hadoop Developer (Banking Domain)

Accenture
03.2019 - 08.2019

Hadoop Developer (Retail + Banking)

TCS
06.2016 - 02.2019

B.Tech - Electrical, Electronics And Communications Engineering

JMIT
Shiwangi Bhatia