Data Engineer with over four years of experience in designing and implementing robust data solutions. Proficient in Python, PySpark, SQL, Pandas, Apache Spark, and AWS, with a proven ability to build scalable data pipelines, optimize processing workflows, and ensure high data quality.
Programming Languages: Python
Big Data Technologies: PySpark
Databases:SQL
Cloud Platforms: Amazon Web Services (AWS)
Data Processing and Analysis: Pandas, Redshift, Athena
ETL Tools: Apache Airflow, AWS GLUE
Version Control: Git
IDE : Jupyter Notebook
To thrive in a dynamic and challenging environment where I can effectively apply my skills and knowledge, contributing meaningfully to both organizational success and my own continuous growth
1.Designed data zones (raw, curated, analytics) with Delta Lake formats.
2.Built PySpark ETL jobs with partitioning and caching strategies.
3.Deployed jobs using AWS Glue, scheduling via triggers and workflows.
4.Queried transformed data with Athena and Redshift Spectrum using SQL.
5.Optimized schema and storage for performance and cost efficiency.
1. Design and implement the data pipeline architecture
2. Develop ETL scripts and workflows
3. Handle data transformation and cleansing
4. Automate data ingestion processes
5. Monitor and troubleshoot pipeline issues
6. Collaborate with stakeholders to understand data requirements
I hereby declare that all information furnished by me is true to the best of my knowledge