Summary

Overview

Work History

Education

Skills

Disclaimer

Projects

Timeline

SARIGA DURAI

Chennai

Summary

Results-driven Data Engineer with 3+ years of experience designing, building, and optimizing large-scale ETL pipelines using PySpark, AWS Glue, and Snowflake. Adept at ingesting semi structured data from cloud storage (S3), automating workflows using Airflow, and ensuring high data quality and performance. Proven success in delivering scalable data lake architectures, implementing schema evolution, and supporting advanced analytics. Experience spans healthcare, workforce analytics, and supply chain domains.

Overview

years of professional experience

Work History

Senior Software Engineer

Fulcrum Digital (Client: Mastercard)

Coimbatore

11.2025 - Current

Worked as a Data Engineer for Mastercard’s Connected Commerce platform, building and supporting large-scale data pipelines.
Developed and optimized ETL workflows using Python, SQL, and Hadoop services (Hive, XLR).
Performed end-to-end data validation across transaction, activation, redemption, target, and merchant feature datasets.
Optimized complex SQL queries and resolved data fan-out issues by implementing partition-based JOIN strategies.
Investigated production data issues, performed root cause analysis, and collaborated with upstream data teams for resolution.
Configured and managed multi-issuer model artifacts and supported scoring pipeline deployments.
Provided production support, monitored pipeline executions, and resolved ETL failures within SLA timelines.
Fixed a critical 3-billion-row fan-out issue, preventing major data corruption in the redemption pipeline.
Validated over 700M+ transaction records with zero NULL values in critical business fields.
Identified upstream category mismatch issues, preventing unnecessary pipeline modifications.
Resolved scoring pipeline failures by redesigning JOIN logic and improving activation data coverage.
Built a reusable framework for multi-issuer model deployment and configuration.
Proactively detected and resolved silent data quality issues before they impacted production.
Worked with Hadoop ecosystem components for distributed data processing and handling high-volume datasets.
Proactively identified and resolved silent data quality and pipeline issues before production impact.
Conducted end-to-end testing and production validation for KSC Server, verifying row counts and critical business columns.
Configured and managed multiple issuer model artifacts and updated model execution paths for production deployments.
Enhanced pipeline reliability through proactive issue detection, root cause analysis, and end-to-end testing.

Senior Software Engineer

ELICO Healthcare Solutions

Chennai

06.2024 - 09.2025

Designed and implemented ETL pipelines in AWS Glue using PySpark to process 20+ daily Excel files (~2.5 GB).
Developed dynamic schema handling using metadata driver files for multiple payer sources.
Applied schema standardisation, null handling, and deduplication, ensuring 99.9% data accuracy.
Loaded partitioned Parquet files into Snowflake for downstream analytics.
Automated full pipeline orchestration with Apache Airflow, integrating alerts via Slack, and email.
Reduced data latency to under 5 minutes, with full row-level audit and schema evolution support.
Developed a comprehensive ETL pipeline leveraging AWS Glue (Apache Spark) to process Excel rebate files from various carriers in Amazon S3.
Employed the spark-excel library to read Excel files and filter records according to the reference CSV.
Executed data transformations, including column renaming, date formatting, and field enrichment (brand, source, payer name).
Standardised data by merging datasets into a single unified DataFrame.
Saved output in partitioned Parquet format on S3, and imported it into Snowflake for reporting.
Assured data consistency and accuracy through utilisation of Amazon Athena queries on processed files. Streamlined pipeline orchestration and monitoring using Apache Airflow, facilitating daily scheduling and alerting.

Senior Associate

First Source

Chennai

03.2024 - 06.2024

Designed an AWS Glue-based ETL for processing truck durability event data using event-driven architecture.
Improved PySpark job efficiency, reducing runtime by 40% through tuning and partitioning.
Built secure Snowflake data ingestion using REST API integrations, and CloudWatch monitoring.
Designed and implemented a Proof of Concept (PoC) to access and process hospital stock information through RESTful APIs, enabling real-time inventory data retrieval and analysis.
Gained hands-on experience in client communication by participating in business discussions and translating insights into actionable data solutions.
Mentored junior associates by providing guidance on best practices and fostering professional development.

Senior Associate

Sutherland Global Solutions

Chennai

06.2019 - 11.2023

Developed and maintained batch and real-time ETL jobs using Python, and Spark.
Designed data ingestion frameworks from multiple sources (APIs, RDBMS, CSV, and JSON).
Optimised SQL queries for better performance and cost efficiency.
Collaborated with cross-functional teams to deliver data-driven insights for business decision making.
Developed scalable PySpark pipelines to ingest and process large volumes of user activity data from CSV sources.
Designed and implemented statistical aggregations (mean, min, max, stddev) across time dimensions—day, week, hour, and daytime.
Used AWS Lambda to validate file formats, and trigger downstream Glue jobs upon the arrival of new files in S3.
Utilised the AWS Boto3 SDK to automate data retrieval, file transfers, and resource management across S3 and other AWS services, enhancing workflow efficiency.
Structured the S3 data lake into raw and processed zones, with partitioning by date to optimise query performance and reduce costs.
Developed hands-on SQL projects involving data modelling, joins, window functions, and aggregate operations to deliver actionable insights from large datasets.
Built and tuned SQL workflows for data validation and transformation, streamlining analytics pipelines, and reducing processing time by 30%.

Education

B.Sc. - Biochemistry

SRM Arts and Science College

Chengalpattu, TN

2019

H.S.C - CBSE

Kendriya Vidyalaya

Chennai, TN

2016

S.S.L.C - CBSE

Kendriya Vidyalaya

Chennai, TN

2014

Skills

Languages: Python, SQL
Big Data Tools: PySpark, Pandas, NumPy, and Delta Lake
Cloud Platforms: AWS (S3, Glue, Lambda, CloudWatch, Athena, IAM), Cloudera Hadoop(HDFS, Hive)

No SQL DB: MongoDB
Data Warehousing: Snowflake
Workflow Orchestration: Apache Airflow, Glue Workflow

Disclaimer

I hereby declare that the information provided above is true and accurate to the best of my knowledge.

Chennai [06/10/2026]

Projects

PROJECT EXPERIENCE Pfizer – Medical Rebates ETL Platform

Client: Pfizer | Tools: AWS Glue, PySpark, Snowflake, Airflow, Excel | Role: Data Engineer

Designed scalable ETL pipelines in AWS Glue using PySpark to process 20+ daily Excel files (~2.5GB) from payer portals (ESI, Zinc, Optum).
Implemented dynamic schema handling via a metadata driver file (payer_portal_reference.csv).
Applied Standardized schemas, handled nulls, cast data types, and performed deduplication ensuring 99.9% data accuracy.
Loaded partitioned Parquet files into Snowflake for analytics teams.
Orchestrated full pipeline using Airflow for S3 file checks, Glue triggers, Snowflake loads, and alerting (Slack/email).
Achieved
Delivered the cleaned and processed rebate data to client analytics teams, enabling improved pricing strategies and contract negotiations with PBMs.

Employee Behavior Analytics Platform

Internal Project | Tools: PySpark, AWS Glue, Snowflake, Lambda, Athena

Built a multi-stage PySpark pipeline to analyze employee interaction logs for behavioral trends.
Utilized window functions for aggregations and anomaly detection.
Automated entire ingestion-to-reporting flow using Lambda and Athena.
Reduced manual reporting workload by 90% via fully automated dashboards.

PROJECT EXPERIENCE – Connected Commerce

Client: Mastercard
Role: Senior Software Engineer
Tools: Python, SQL, PySpark, Cloudera (HDFS, Hive), Hadoop Ecosystem, Git, Bitbucket,GitHub Copilot

Worked on Mastercard’s Connected Commerce platform, supporting Redemption, Incrementality, and Scoring data pipelines.
Developed and optimized SQL and PySpark transformations for large-scale data processing.
Resolved a critical JOIN fan-out issue in 208FinalModule.sql, eliminating a 3-billion-row data duplication problem.
Performed end-to-end production validation across target, transaction, activation, redemption, and merchant feature datasets.
Investigated NULL records and identified upstream category mismatches through detailed root cause analysis.
Fixed Scoring pipeline JOIN logic by converting INNER JOINs to LEFT JOINs, improving data coverage and pipeline stability.
Worked with the Cloudera Hadoop environment and HDFS for distributed data processing and model artifact management.
Configured and validated multiple issuer model artifacts and supported production deployments.
Collaborated with upstream data teams to resolve data quality issues and document dependency chains.

Key Achievements

Prevented major data corruption by fixing a silent 3B-row JOIN fan-out issue.
Validated 700M+ transaction records with business-acceptable data quality.
Standardized JOIN patterns and improved maintainability across scoring pipelines.
Created reusable support and RCA documentation for future maintenance activities.

Timeline

Senior Software Engineer

Fulcrum Digital (Client: Mastercard)

11.2025 - Current

Senior Software Engineer

ELICO Healthcare Solutions

06.2024 - 09.2025

Senior Associate

First Source

03.2024 - 06.2024

Senior Associate

Sutherland Global Solutions

06.2019 - 11.2023

B.Sc. - Biochemistry

SRM Arts and Science College

H.S.C - CBSE

Kendriya Vidyalaya

S.S.L.C - CBSE

Kendriya Vidyalaya