Data Engineer with 8+ years of experience designing and delivering scalable cloud-native data pipelines and platforms for medical technology, claims, navigation technology and health & wellness sectors.
Overview
9
9
years of professional experience
Work History
Sr. Software Engineer
Yash Technologies
Hyderabad
08.2023 - Current
Automated and rearchitected data pipelines using AWS Step Functions, Glue, Lambda, and EventBridge. Achieved 30% cost reduction
3x improvement in pipeline throughput and reduced overall processing time by 50%
Built and deployed AWS Glue jobs to ingest large data files from SAP and non-SAP systems, including SharePoint, transform the data, and load it into S3 for downstream processing.
Developed AWS Lambda functions to trigger DBT jobs, perform data transformations, and manage secure file transfers between SAP and non-SAP systems and S3 buckets using SFTP connections.
Configured Snowpipe for real-time data ingestion from S3 into Snowflake and developed DBT models for incremental and snapshot-based transformations.
Designed and implemented end-to-end data orchestration workflows using AWS Step Functions, ensuring seamless integration across various services and automation of complex data pipelines through Terraform for infrastructure provisioning.
We used Python and PySpark extensively to develop robust and scalable data transformation logic, optimized for performance and reliability in a distributed environment.
Developed and maintained Infrastructure as Code (IaC) using Terraform to provision and manage AWS resources including Lambda, Glue, S3, IAM, and Step Functions.
Integrated Terraform workflows into GitLab CI/CD pipelines for automated provisioning, validation, and deployment of AWS infrastructure.
Built ETL pipelines using AWS (Glue, S3, Redshift, RDS) and PySpark to automate over 10 data workflows
Designed and developed a custom data quality framework using PySpark, Python, and EMR, including multiple reusable testing methods used by the QA team to validate migrated data across systems and supported over 100GB of data.
Analyzed and processed complex datasets using advanced SQL querying, data visualization tools, and analytical platforms to drive actionable insights and support business decision making.
Developed batch-processing workflows using Python for a machine learning project, allowing efficient preprocessing and feature engineering at scale.
Developed Python data pipelines for a machine learning model that improved the detection of fraud claims by 45% and avoided manual processing of claims.
Used AWS Athena, QuickSight and Tableau to create insightful reports and dashboards for business users, improving data visibility and decision making
Developed a high-performance data pipeline using Python to transfer data from MySQL to Amazon Redshift, significantly improving reporting and visualization performance.
Orchestrated complex ETL pipelines using AWS Glue Workflows, coordinating Glue Jobs, Crawlers, and triggers to automate data ingestion and transformation across S3, Redshift, and RDS.
Developed and scheduled DAGs in Apache Airflow to manage data pipelines across AWS services.
Software Engineer
RMSI
Hyderabad
11.2016 - 09.2019
Worked on QGIS and PostgreSQL wrote queries to retrieve and analyze spatial data
Created Tableau dashboards to visualize geographic and tabular data
Worked on performing data quality checks on GIS datasets to ensure accuracy and consistency
Worked on collecting the data from variety of sources and performed data preparation tasks
Worked on table calculations, Calculated Fields, LOD and Parameters etc
Validated the spatial data with maps production data