Software Engineer
- Utilized AWS S3 for storing and managing large volumes of data
- Cleaning, transforming and sorting the data in S3 using AWS Glue
- Utilized components of AWS Glue like Data catalog, Crawlers, Sensitive Data Detection, Data quality and CloudWatch for monitoring jobs
- Utilized Athena for further querying and analysis of transformed data
- Created Lambda functions for triggering Glue jobs upon S3 object updates
- Created Python utilities which will be used to connect source for the data processing we have used PySpark
- Updated existing PySpark code and tried to make them more generic across the platform
- Responsible for pull, push and committing the code using Github
- Debugging and fixing the issues by sampling the data
- Responsible for monitoring and troubleshooting ETL jobs.