Experienced IT professional with 6.1 years of expertise in AWS technologies like AWS Glue, AWS Athena, AWS Lambda, AWS Redshift, and AWS S3, with a strong background in Informatica PowerCenter, IICS, and Oracle
• Designing, developing, and implementing AWS Glue jobs to extract, transform, and load (ETL) data from various sources into a data warehouse .
• Writing and optimizing ETL scripts using Apache Spark and Pyspark within AWS Glue to process large-scale datasets.
• Configuring and managing AWS Glue data catalog, including defining schemas, tables, and partitions.
• Working with other AWS services such as Amazon S3, Amazon Redshift, AWS Lambda, and AWS CloudFormation to build end-to-end data solutions.
• Designed, developed, and supported Extraction, Transformation, and Load (ETL) processes for data migration using Informatica power center, IICS, AWS Redshift and Oracle.
• Designing and implementing Glue crawlers to automatically discover and catalog data in various formats, enabling efficient querying and analysis with Athena.
• Utilizing the Redshift unload command to efficiently export data from Redshift to external storage or S3 for further processing or archiving and copy command for vice versa.
• Monitor data pipelines and systems using AWS CloudWatch.
• Implementing serverless event-driven architectures using AWS Glue triggers and AWS Lambda to automate data processing tasks based on events or schedule.
• Orchestrating and scheduling data processing tasks and dependencies using AWS Glue Job Scheduler or external workflow orchestration tools like AWS Step Functions.
• Writing PySpark code to perform data transformations, aggregations, joins, and filtering operations on large-scale datasets.
• worked on creating AWS Glue jobs to extract, transform, and load (ETL) data from various sources to target database or datawarehouse.
•Writing and optimizing ETL scripts using Apache Spark and Pyspark within AWS Glue.
• Designed, developed, and supported Extraction, Transformation, and Load (ETL) processes for data migration using Informatica power center, IICS, AWS Redshift and Oracle.
• Created various mappings using Mapping Designer and utilized transformations such as Aggregator, Lookup, Filter, Router, Joiner, Source Qualifier, Expression, Stored Procedure, Sorter, and Sequence Generator.
• Designed and developed Informatica Mappings and Sessions based on business user requirements and business rules to load data from different sources like flat files and RDBMS tables to target tables.
• Utilizing the Redshift unload command to efficiently export data from Redshift to external storage or S3 for further processing or archiving and copy command for vice versa.
• Good understanding of Partitioning , Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
• Developed complex mappings involving Slowly Changing Dimensions (SCD1 and SCD2) and implemented business logic.
• Conducted performance tuning activities in Redshift and Informatica, identifying and rectifying performance bottlenecks, optimizing indexes in oracle and Informatica.
• Validate and ensure data quality by performing data profiling, data validation, and data cleansing activities within Athena.
• Worked on Optimization of queries by applying Sort and Distribution keys on the Redshift Tables.
• Have done Performance tuning and space optimization for the queries.
Technical Skills
AWS Services: AWS Glue, AWS S3, AWS Athena, AWS Lambda, AWS Redshift, Redshift Spectrum
ETL Tools: Informatica power center 1041, IICS
Databases: Redshift, Oracle 11g, PostgreSQL
Spark, Sqoop, Hive, Data ware housing