
Data Engineer with 4+ years of experience building and optimizing scalable ETL pipelines for large-scale financial datasets. Proficient in PySpark, Apache Spark, and AWS EMR, with strong expertise in distributed data processing. Reduced ETL processing time from 9 hours to 1 hour through performance optimization. Skilled in data modeling and delivering high-quality data solutions for business insights.
Tech Stack: AWS EMR, S3, Lambda, CloudWatch, Airflow, Hadoop, Spark, PySpark, SQL, Hive (Tez), Hdfs, Oozie, Quicksight
Programming & Data Processing:
Python, SQL, PySpark, Shell Scripting, XML
Big Data Technologies:
Apache Spark, Apache Hadoop, HDFS, AWS EMR
Cloud & AWS Services:
Amazon S3, AWS IAM (Roles & Policies), AWS Lambda, Amazon CloudWatch
Data Warehousing & Databases:
Snowflake, Apache Hive, MySQL, SQL Server
Workflow Orchestration:
Apache Airflow, Apache Oozie
Analytics & Visualization:
Tableau, Power BI, Amazon QuickSight