Accomplished PySpark Developer at TCS, skilled in Python and AWS technologies. Successfully designed ETL pipelines that reduced data processing time by 40%. Proficient in CI/CD practices and Agile methodologies, demonstrating strong problem-solving abilities and a commitment to delivering high-quality solutions.
Designed and developed PySpark-based ETL pipelines for large-scale data migration from legacy systems to the AWS cloud.
- Created highly efficient Spark transformations and actions to process millions of records, with optimized resource utilization.
- Developed AWS Glue jobs for the automated extraction, transformation, and loading of structured and semi-structured data.
- Integrated AWS Lambda and Step Functions to orchestrate complex ETL workflows.
- Implemented data validation and cleansing using SQL to ensure data accuracy during migration.
- Automated AWS infrastructure provisioning using Terraform to ensure reproducible and secure deployments.
- Managed source code with Git and Bitbucket, following branch strategies and code review practices.
- Configured CI/CD pipelines using Jenkins to automate build, test, and deployment processes.
- Participated in Agile Scrum ceremonies to ensure the timely delivery of high-quality software.
- Contributed to troubleshooting, performance tuning, and production deployment of ETL jobs.
- Reduced data processing time by 40% through partitioning, caching, and performance tuning.
Google Cloud Associate Cloud Engineer, Cloud Environment Setup, Deployment and implementation of cloud solutions, Configuration, pricing and security topics covered.