Data Engineer with hands-on experience in building scalable ETL pipelines, data warehousing, and visualization using Azure, Databricks, and Tableau. Proficient in Python, SQL, and cloud technologies with a strong foundation in data modeling, performance optimization, and business analytics.
Medallion Architecture project
Built and orchestrated ETL pipelines using Azure Data Factory (ADF) to automate raw data ingestion from REST APIs into Azure Data Lake Storage Gen2 (ADLS) following Medallion Architecture.
Performed scalable data transformation, cleansing, and enrichment with Azure Databricks (PySpark & SQL), enabling reliable Silver layer datasets.
Designed aggregated and business-ready data models using Azure Synapse Analytics (Serverless SQL Pools) and stored curated outputs in Parquet format within the Gold layer.
Integrated the Gold layer with Power BI to deliver interactive dashboards and actionable insights for business stakeholders.
Big Data Analysis on NYC Civil List
This project involved the NYC Civil dataset and utilized advanced data technologies. I developed a scalable system architecture on Azure VM-using Docker for containerization.
Supervised storage and integration using Postgres and MongoDB, leveraging PySpark for efficient data processing and Parquet conversion. Conducted exploratory data analysis, visualized results, and performed cluster analysis for valuable insights.
Bookstore Management System
Developed a Bookstore Management System using Java and Spring Boot to manage orders, customer info, and inventories Implemented role-based CRUD operations and rate-limited REST APIs, reducing server load by 30% during peak hours and ensuring equitable resource distribution.
Enhanced data integrity and error handling with Hibernate Validator. Implemented Swagger for auto-generated API documentation, boosting development speed by 20%, and ensuring consistent, scalable operations