Innovative and results-driven Data Engineer with 4.5 years of experience architecting, optimizing, and deploying highperformance data pipelines in Big Data ecosystems. Proficient in Python, SQL, Pyspark, and Presto to process and analyze massive datasets (450GB+), delivering actionable insights and operational efficiency. Adept at leading teams, solving complex data challenges, and implementing Agile-driven ETL solutions to drive business success. Passionate about transforming raw data into intelligent, production-grade data platforms that empower analytics and decision-making.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Digital S/W Engineer Analyst
Citicorp Services India Private Ltd.
Pune
01.2023 - Current
Designed Sqoop processes, migrating 14 years of financial data (~450 GB) to Big Data O&T clusters, ensuring 99.9% data integrity, zero data loss during transfer, and efficient data storage.
Integrated large datasets from the Olympus data source using the Python gRPC client, improving transaction data processing efficiency by replacing Talend.
Worked with product managers and business analysts to convert business needs into scalable data solutions.
Developed high-performance ETL pipelines using PySpark and Presto on Trino, eliminating data redundancy, reducing latency by 50%, and enabling cross-cluster access for analytics.
Deployed multiple pipelines using Jenkins and RLM, ensuring seamless production rollouts with minimal downtime.
Resolved pipeline failures involving large data volumes of 30-150 million by performing RCA and implementing timely fixes, improving pipeline efficiency by 98%.
Enabled understanding of technical issues through effective assistance in RCA and business requirement analysis for several teams.
Mentored junior engineers, improving their skills in ETL optimization, distributed computing, and Big Data frameworks.
Senior Data Engineer
Larsen and Toubro Infotech
Pune
07.2020 - 12.2022
Migrated compliance data warehouse (CDW) data to the Enterprise Analytics Platform (EAP) for Citi Group Corporate Banking in 18 countries, ensuring seamless data pipeline integration.
Designed data models and mapping documents, improving data integrity, usability, and better enrichment of party details for all transaction records.
Led root cause analysis (RCA) for many production failures, enhancing cross-border transaction data quality.
Automated data reconciliation tasks post-ingestion using PySpark, reducing validation time by 50%.
Conducted exploratory data analysis (EDA) on financial transactions, uncovering key insights to improve AML monitoring and data quality for 15 million records.
Education
Bachelors of Engineering - Electrical Engineering
Jadavpur University
Kolkata, India
06.2020
Skills
Python programming
SQL (Hive, MySQL, Presto)
Big Data frameworks (Pyspark, Hadoop, Sqoop)
ETL development
Design optimization
Problem solving
Jenkins
Bitbucket
Autosys
Team collaboration
Agile methodologies (Scrum)
Stakeholder communication
Certification
Amazon Cloud Practitioner Certification (July,2023 till July, 2025)