Data Engineer with 4+ years of experience in designing and deploying robust data solutions. Skilled in Python, PySpark, SQL, AWS, Pandas, and Apache Spark, with a demonstrated ability to build scalable data pipelines, enhance data processing efficiency, and maintain high data integrity. Combines strong analytical problem-solving with expertise in big data technologies and cloud platforms to deliver optimized data infrastructure.
Time Management Adaptability
Soft Skills
Time Management
Effectively prioritizes critical tasks to avoid last-minute rushes Establishes realistic deadlines to ensure timely achievement of milestones
Adaptability
Recognizes that change is inevitable and approaches it objectively Structures projects into manageable components, allowing for quick adjustments when needed
Project name: Data pipeline automation
Technologies: Python, SQL, AWS, Pandas
Description: Develop an automated data pipeline using Python and AWS services to extract, transform, and load data from various sources into a centralized data repository
Responsibilities: 1. Design and implement the data pipeline architecture,
2. develop ETL scripts and workflows
3. handle data transformation and cleansing
4. automate data ingestion processes,
5. monitor and troubleshoot pipeline issues
6. collaborate with stakeholders to understand data requirements
Project Name: Data Pipeline Orchestration with Apache Airflow
Technologies: Python, Apache Airflow, SQL, ETL tools, data modeling
Description: Design and implement a data pipeline using Apache Airflow to schedule and automate the execution of ETL tasks, data transformations, and data loading processes.
Responsibilities: 1. Configure and manage the Apache Airflow infrastructure,
2. develop and maintain ETL workflows
3. monitor and troubleshoot pipeline issues
4. collaborate with stakeholders for task scheduling and dependencies