Skilled AWS Data Engineer with 1year internship experience in ETL pipelines using AWS Glue/Lambda, optimizing Spark data pipelines. Expertise in developing and implementing complex data pipelines, maintaining data warehouses, and managing big data solutions. Proficient in programming languages like Python and SQL, and experience working with Hadoop, Spark, and AWS services.
Internship Project:
Building Data Warehouse in
Redshift–DMS-S3-GLUE-REDSHIFT
Roles and Responsibilities:
• Responsible for migration of existing SQL server databases into AWS S3 storage unit using AWS DMS (Data Migration Service).
• Create PySpark scripts for validating the data which is migrated.
• Optimized code using PySpark for better performance.
• Develop and monitor script to load the data into redshift tables using Glue PySpark.
• Working with Avro and Parquet file formats and used various compression techniques to leverage the storage in HDFS.
• Create DDL scripts based on the existing table schema in SQL server and implement them in the Redshift database.
• Deploying the created ETL codes and performing the Unit testing to provide the update.
• Creating/Updating the technical document with the necessary details to deploy the ETL code for scheduling the job in production.
Cloud Technologies : AWS ( Glue, Cloudwatch, Redshift, DMS, EMR, RDS, Lambda, IAM, SNS, Athena, EC2 ), Databricks
undefined