Detail-oriented Data Engineer with over 4.8 years of experience in designing, developing, and optimizing scalable data solutions. Proficient in Python, PySpark, SQL, and Snowflake, with substantial expertise in AWS and Azure cloud ecosystems. Adept at building robust ETL/ELT pipelines, streamlining data ingestion and transformation for analytics and reporting. Currently employed as a Software Development Engineer II at Sigmoid Analytics, contributing to strategic data engineering initiatives. Previously at LTIMindtree, I led several end-to-end data integration and automation projects, enhancing operational performance and data availability. Strong analytical mindset with a passion for data engineering and analytics. Committed to continuous learning and excellence in data architecture and insights-driven decision-making.
Real-Time Streaming Data Pipeline: MongoDB to Snowflake Using Kafka and ksqlDB
(Jan 2024 – Apr 2025)
• Designed and deployed a real-time pipeline to capture MongoDB change streams via Kafka Connect.
• Developed stream processing logic with ksqlDB to cleanse and transform semi-structured JSON data.
• Enabled real-time analytics by continuously loading transformed data into Snowflake.
• Maintained data integrity with schema validation and custom transformation rules.
• Tuned Kafka performance for high-throughput CDC ingestion.
• Collaborated with BI teams to ensure Snowflake schema alignment with analytical requirements.
• Authored comprehensive documentation for architecture and troubleshooting.
Tech Stack: MongoDB, Kafka, ksqlDB, Snowflake, Confluent Platform, Docker, Git.
Automated Financial Data Integration & Analytics
(Jan 2022 – Dec 2023)
• Engineered a scalable ETL pipeline using AWS Glue and Python for ingesting financial data from SharePoint APIs.
• Handled pagination and rate limiting to manage large datasets efficiently.
• Organized raw data in S3 with partitioning for streamlined processing.
• Introduced a Data Quality Index (DQI) to monitor column-level quality metrics.
• Transformed and standardized data for downstream consumption.
• Converted data to Parquet format for efficient querying in Athena and Redshift Spectrum.
• Maintained a dedicated test environment to validate schema and ensure data integrity.
Tech Stack: AWS Glue, S3, Python, SharePoint API, Parquet, Git, DQI, Validation Frameworks.
Smart Data Pipeline for Transportation Planning
(Oct 2020 – Dec 2021)
• Automated extraction of forecast files from customer emails using Azure Logic Apps.
• Centralized file storage in Azure Blob with duplicate management logic.
• Orchestrated ETL flows via Azure Data Factory based on metadata mapping.
• Triggered Azure Functions for data transformation and preparation.
• Integrated Azure ML for capacity planning optimization.
• Built Power BI dashboards and implemented monitoring with Azure Monitor and DevOps pipelines.
Tech Stack: Azure Logic Apps, Blob Storage, Data Factory, Functions, ML, SQL Database, Power BI, Azure Monitor, Python SDK