

Results-driven Data Engineer and ETL Developer with 5+ years of experience designing, developing, and optimizing large-scale data pipelines and ETL workflows. Proven expertise in migrating legacy systems to modern cloud platforms (AWS, Azure) and converting code to PySpark on Hadoop and Databricks environments. Strong background in data integration, warehousing, and business intelligence with proven ability to improve data processing performance and ensure data quality across enterprise applications.
- Worked on Migration of Informatica Workflows to Azure Databricks Workflows by converting the code to PySpark and processing 10+ million financial data by improving the data load performance by 40%.
- Performed Data Validation and File Validation with respect to the Source Informatica Tables and Files resulting 99% accuracy.
- Migrated the legacy code of Sybase Database to PySpark on Azure Databricks platform, processing 10+ million records data and doing the data validation by giving 98% accuracy.
- Worked on migration project converting Vertica and DataStage code to PySpark on Hadoop environment, reducing query execution time by 40%
- Engineered and executed Spark 2 to Spark 3 upgrade with comprehensive testing, ensuring zero data loss and 100% compatibility across all modules
- Designed and implemented data validation framework ensuring data integrity across DataStage and PySpark workflows with 99.9% accuracy rate
- Managed Autosys job scheduling for converted PySpark pipelines, successfully scheduling and monitoring 50+ daily ETL jobs
- Collaborated in Agile environment using JIRA, delivering weekly sprints with zero critical bugs in production
- Orchestrated migration of DataStage workflows to PySpark, creating 50+ AWS Glue jobs for data pipeline replication
- Performed comprehensive data validation comparing DataStage and PySpark outputs, identifying and resolving 25+ data inconsistencies
- Reduced data processing time by 50% through PySpark optimization and parallel processing techniques
- Migrated Teradata (Bteq) and Ab Initio workflows to PySpark on Azure Databricks platform.
- Designed ETL pipelines handling 10+ million records daily with 99.95% success rate
- Implemented data quality checks and error handling mechanisms, reducing post-processing issues by 35%
- Converted Teradata (Bteq) queries to optimized PySpark code, improving query performance by 45%
- Developed and deployed 15+ AWS Glue jobs for automated data ingestion and transformation
Data Analytics Projects | Python, SQL, Tableau, Excel
- Expense Tracker Application (Excel with Macros), - Stock Market Performance Analysis and Forecasting (Python)
- Data Science Job Salaries Analysis and Visualization (Python)
- Sales Data Analysis and Reporting (SQL)
- British Airways Dashboard with Business Intelligence Insights (Tableau)
- Video Games Market Dashboard and Trend Analysis (Tableau)
Azure Databricks | ETL Development | Cloud Migration | PySpark Optimization | Data Pipeline Architecture| AWS Glue | Data Warehousing | SQL Query Optimization | Data Validation | Apache Spark | Agile Methodology | JIRA | REST API Development