Summary

Overview

Work History

Education

Skills

Timeline

Hi, I’m

Shiwangi Bhatia

Gurgaon

Summary

Results-driven Senior Data Engineer with over 9 years of expertise in developing data-driven solutions that enhance efficiency and accuracy while delivering actionable insights. Proficient in leveraging Big Data technologies such as Apache Spark, Spark Streaming Kafka , streamsets , Apache NiFi, S3 , AWS Glue , Athena , Iceberg , and NoSQL databases including MongoDB, Cassandra to address complex business challenges and customer needs . Strong blend of functional and technical knowledge, complemented by hands-on experience with cloud technologies like AWS. Committed to implementing innovative data solutions that drive strategic decision-making and optimize organizational performance.

Overview

years of professional experience

Work History

Natwest Group

AWS Data Engineer

08.2023 - Current

Job overview

Architected and led the development of a real-time Customer Data Platform (CDP) to ingest unstructured behavioral events (IN, UP, DL) using schema-on-read processing and dynamic routing, enabling scalable personalization and faster onboarding of new data sources.
Designed and implemented stream-stream joins in PySpark and Apache Kafka to enrich customer profiles with transactional context for fraud detection, real-time segmentation, and next-best-action modeling.
Integrated MongoDB Change Data Capture (CDC) using Kafka Connect to stream live updates across both analytical and operational systems, ensuring synchronized, up-to-date data pipelines.
Applied TTL indexing in MongoDB to automatically expire transient data, reducing storage footprint and ensuring GDPR compliance for time-bound customer data retention.
Automated schema registration and governance for AVRO and JSON formats with built-in versioning, compatibility validation, and enforcement of data standards, streamlining onboarding and reducing schema drift.
Delivered high-volume curated datasets (>45M daily transactions) to analytics consumers through:
Amazon S3 (with Iceberg for data lake optimization)
OpenSearch for real-time insights
Glue Data Catalog for metadata management and query federation
Implemented enterprise-grade data governance frameworks (OBDI, OBDQ, OBDC) to:
Standardize data ingestion
Validate and monitor data quality
Enforce secure, auditable access across domains
Enabled self-service data discovery through:
Data lineage tracking
Integration with Glue Data Catalog, reducing dependency on engineering teams
Built an Athena query layer on top of S3, using partitioned and cataloged data to support QuickSight dashboards and business-critical reporting.
Partnered with Data Science and Marketing teams to operationalize data products, delivering actionable insights aligned with Salesforce Bedrock principles for scalable, customer-first data architecture.

American Express

Senior Data Engineer

05.2021 - Current

Job overview

Regulatory Risk Reporting Framework (CECL/ IFRS9 / CCAR)

Developed Regulatory Risk Reporting Framework for calculation of ECL for credit risk provisioning for CECL/IFRS data using front Book , Backbook data , National/State Macros data .
Maintaining and optimizing ETL/ELT pipelines in Python/Hive/PySpark to feed data into the cornerstone
Depicting ECL drift due to Model change / Macro change in Normal/Adverse Economic Conditions like Recessionary Period , Covid Economic Scenarios, Macro changes .
Analyzed large datasets to identify trends and patterns in model behaviors.
Implementing automated BNC checks and Data quality controls to ensure smooth PWC auditing and quality of data Ensured effective incident handling and resolution within SLA
Documentation of Model execution Controls through SOX documentation , PWC Documentation .
Responsible for generating aggregated reports and Tableau and Datasets for Data Science team

Nagarro Software

Big Data Lead / Developer (Telcom + Retail Domain)

11.2020 - 05.2021

Job overview

IDP AWS Platform

Managed 3 cross functional teams to work in close collaboration with analytics , engineering , stakeholders , Product Owners , led team of 4 to deliver easy to customize Big data solution for targeting multiple end clients
Developed generic framework that provides single repository/data lake solution for 360 view of Customer Data.
Developed end-to-end infrastructure deployment /provisioning that
includes provisioning lambdas, step functions, VPC, security groups, IAM roles, etc.
Apache NIFI and AWS DMS for Source Based CDC , Target Based CDC for insert , update and merge operations of Data Manipulation
Wrote business rules for spark job using Drool’s rule engine.
Worked on deployment module for deploying different lambdas definition on AWS like creation of EMR cluster, inserting transform data into Redshift, populating DynamoDB table entries using S3 event notification on lambda, etc.
Development of configuration-based spark modules for performing read,write, transform, and configuring flow of ETL pipeline using JSON configuration.
Orchestrations of spark Jobs to be submitted on Livy.
Wrote different step functions to orchestrate different lambdas and cloud watch event rules for scheduling step-functions timely.

Accenture

Hadoop Developer (Banking Domain)

03.2019 - 08.2019

Job overview

Athena Replatforming Project

Established data ingestion, transformation and profiling data pipeline in AWS using AWS S3, EMR, GLUE, ATHENA which increased execution efficiency and reduced project cost
Involved in technology migration from SAS to Python and Spark .
Creation of DVF scripts to validate SAS and Spark Datasets in Python
Implementation of clustered tables in Pyspark using Indexing and optimization of underlying datasets in Hive

TCS

Hadoop Developer (Retail + Banking)

06.2016 - 02.2019

Job overview

Teradata to Hadoop Migration

Built Hive scripts and did reformation with actual data Using Pig and finally inserted data back to Hive tables with dynamic partitioning.
Built Sqoop for transferring of data from hive tables to Teradata and vice-versa.
Used special ORC Format tables with UTF-8 encoding so that all special character are handled perfectly even in Hive tables.
Converting JDBC code written for Teradata to HBASE Java API code and implementing filter classes in java and comparison of results using Soap UI

Education

JMIT

B.Tech from Electrical, Electronics And Communications Engineering

University Overview

Skills

Pyspark

Timeline

AWS Data Engineer

Natwest Group

08.2023 - Current

Senior Data Engineer

American Express

05.2021 - Current

Big Data Lead / Developer (Telcom + Retail Domain)

Nagarro Software

11.2020 - 05.2021

Hadoop Developer (Banking Domain)

Accenture

03.2019 - 08.2019

Hadoop Developer (Retail + Banking)

TCS

06.2016 - 02.2019

JMIT

B.Tech from Electrical, Electronics And Communications Engineering

Similar Profiles

Anitha SelvamAnitha Selvam
Mortgage Underwriter – POD A at NatWest GroupMortgage Underwriter – POD A at NatWest Group
Hari Kishan BeerumotaHari Kishan Beerumota
AWS and Devops Architect at Natwest GroupAWS and Devops Architect at Natwest Group
SHAHERYAR ASIFSHAHERYAR ASIF
Data Analyst at NatWest GroupData Analyst at NatWest Group
Mrinal KumarMrinal Kumar
Senior AML Analyst at Natwest GroupSenior AML Analyst at Natwest Group
GAUTAM DHAMIJAGAUTAM DHAMIJA
ServiceNow Developer at Input Zero Technology Pvt. LtdServiceNow Developer at Input Zero Technology Pvt. Ltd

CREATE PROFILE

Summary

Overview

Work History

Natwest Group

Job overview

American Express

Job overview

Nagarro Software

Job overview

Accenture

Job overview

TCS

Job overview

Education

JMIT

University Overview

Skills

Timeline

Similar Profiles

Anitha SelvamAnitha Selvam

Hari Kishan BeerumotaHari Kishan Beerumota

SHAHERYAR ASIFSHAHERYAR ASIF

Mrinal KumarMrinal Kumar

GAUTAM DHAMIJAGAUTAM DHAMIJA