Experienced data professional with 2+ years of expertise in data analysis, transformation, and pipeline development. Proficient in PySpark, SQL, and Python for data processing and analysis. Skilled in building and optimizing ETL solutions, integrating data from diverse sources, and ensuring data accuracy and consistency. Strong background in identifying patterns, uncovering insights, and enhancing data workflows to support business decisions. Adept at handling large datasets and improving data quality to drive analytical outcomes.
Developed an end-to-end ETL pipeline to process and analyze healthcare data for cardiovascular disease prediction using the Cleveland Heart Disease dataset. Built and trained a machine learning model using Python and Scikit-learn, Pandas achieving 93% accuracy. Designed and optimized data workflows to enhance processing efficiency and model performance.