Aspiring Data Engineer with an M.Sc. in Data Science (graduating in 2025) and 1.5 years of ETL experience. Proficient in Python, R, SQL, and cloud platforms, skilled in designing and optimizing data pipelines, ETL processes, and data workflows for large-scale datasets. Eager to build efficient and scalable data solutions that drive decision-making.
Real-Time Data Pipeline for IoT Devices
Build a real-time data pipeline that collects, processes, and visualizes data from IoT devices. Using tools like Apache Kafka for data ingestion, Apache Spark for processing, and a real-time dashboard using tools like Grafana for visualization.
Technologies: Apache Kafka, Apache Spark, Grafana.
.
End-to-End ETL Pipeline for E-Commerce Data Analytics
Developed an end-to-end ETL (Extract, Transform, Load) pipeline for an e-commerce dataset. The project involve extracting data from multiple sources (e.g., sales data, user behavior logs), transforming the data for analysis (e.g., data cleaning, aggregation), and loading it into a data warehouse for analysis and reporting.
Technologies: Apache Airflow, SQL, Python, Google BigQuery.
.
Data Lake Implementation for Big Data Analytics
Created a data lake architecture to manage large volumes of structured and unstructured data. This project involve setting up a data lake using cloud services, implementing data ingestion and storage strategies, and running complex queries and analytics on the data.
Technologies: Google Cloud Storage, Google BigQuery.