Accomplished Data Engineer with extensive experience at StateStreet Group, specializing in AWS and Spark. Successfully led the design of a high-performance data lakehouse, achieving 99.9% data validity. Proficient in Python and SQL, I excel in optimizing data pipelines while fostering collaboration across teams to drive impactful data solutions.
Known for excelling in collaborative environments, adapting swiftly to evolving needs, and driving team success.
Overview
10
10
years of professional experience
Work History
Data Engineer - Senior Associate
StateStreet Group
04.2024 - Current
Led End-To-End design of an AWS lakehouse + Redshift warehouse for a global custodian bank, ingesting [X]M+ daily records across SWIFT/ISO/trade feeds. I have built canonical models for trades, positions, fees and profitability.
Built Airflow-orchestrated Spark pipelines (EMR/Glue) with parquet/ZSTD, partitioning by as_of_date, region bucketing on client ID. Enabled T+0.5 delivery of client/region profit margins.
Optimized Spark(AQE, skew salting, broadcast joins) and Redshift (RA3, sort/ dist keys, Spectrum, MVs) reducing job time and compute cost.
Hardened DQ with row-count/hash reconciliation and GE checks. Achieved 99.9% valid records and automated quarantine/ reprocess.
Secured platform with KMS, Lake formatting and column masking. Implemented lineage and audit trails (Glue catalogue, CloudTrail)
Drove observability(Cloudwatch) and SRE runbooks. P95 pipeline latency <= 3.5h with auto remediation and late feeds.
Idempotent run design keyed by run_id.
Backfills via parameterized DAG runs with safe re-writes.
Big Data Developer
Wipro Technologies
03.2022 - 04.2024
Design, develop, and operate high-scalable, high-performance, low-cost, and accurate data pipelines in distributed data processing platforms
Recognize and adopt best practices in data processing, reporting, and analysis: data integrity, test design, analysis, validation, and documentation
Design, build and own all the components of a high-volume data warehouse end to end.
Continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers
Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
Automated routine tasks using Python scripts, increasing team productivity and reducing manual errors.
ETL Developer
Kyndryl India Pvt Ltd
01.2018 - 03.2022
Design, develop, and maintain ETL pipelines to extract, transform, and load data into Amazon Redshift.
Work closely with business analysts, data engineers, and stakeholders to understand data requirements and translate them into ETL solutions.
Optimize complex SQL queries and ensure efficient performance in Redshift.
Perform data profiling, data quality checks, and troubleshoot data issues.
Implement incremental loads, change data capture (CDC), and performance tuning techniques.
Monitor ETL jobs, data pipeline health, and manage data recovery procedures when required.
Develop stored procedures, functions, and scripts to support data processing and transformation.
Document technical designs, processes, and data flows.
Data Analyst
IBM, Global Business Services
03.2015 - 01.2018
Work with stakeholders from different departments to create the detailed DDIT Data & Analytics solution/service design, based on functional specifications to meet quality and performance requirements and technical constraints.
Contribute to improve local/simple business processes, products, services, and software through data analysis.
Engage with business representatives and support the appropriate DDIT teams and Functions to develop business requirements and deliver data-driven recommendations to improve efficiency and add value.
Data Analysis: Examine complex data sets, identify patterns and trends, and generate reports to aid in decision-making.
Contribute to create consistency and traceability between D&A Products requirements, functional specifications, and testing & validation. Support validation and testing as appropriate and ensure adherence to Security and Compliance policies and procedures within Service Delivery scope.
Support with internal IT systems and documentation requirements, standards (including quality management and IT security), regulatory environments / requirements (if applicable), DDIT Service Portfolio and with industry best practices in leveraging technologies for the business and taking advantage of reusable products, solutions and services wherever applicable.