Aspiring Data Scientist with hands-on experience in data preprocessing, model development, and visualization. Skilled in Python, SQL, and cloud platforms, with a strong foundation in machine learning and deep learning. Passionate about leveraging data to derive actionable insights, and deliver impactful solutions.
• Sentiment Analysis of Product Reviews : Developed an NLP pipeline to classify customer sentiment from Amazon product reviews. Performed web scraping, text preprocessing (tokenization, lemmatization), and exploratory data analysis. Built and compared multiple sentiment classifiers (Logistic Regression, Naive Bayes). Deployed the final model via Streamlit for real-time review sentiment prediction. This project focused on capturing customer feedback trends and improving user experience insights.
• Fraud Detection in Financial Transactions :Built a machine learning model to proactively identify fraudulent transactions from a dataset with over 6 million records. Applied extensive data cleaning, handled missing values, outliers, and multicollinearity. Used SMOTE for class imbalance and implemented ensemble models including Random Forest and XGBoost. Evaluated using ROC-AUC and accuracy. Extracted key predictors of fraud and proposed actionable prevention strategies. Final insights were delivered through visual dashboards
• Song Recommendation System : Developed a personalized song recommendation system using collaborative (SVD, ALS), content-based (TF-IDF, cosine similarity), and hybrid filtering Evaluated performance with precision, recall, and F1-score, and deployed the system using Streamlit.
• Clustering of Global Development Measurement : Implemented a clustering model to analyze global development metrics, identifying patterns among countries based on socioeconomic indicators. Performed data preprocessing, feature selection, and dimensionality reduction using PCA. Applied K-Means and Hierarchical clustering to segment countries and evaluated cluster quality using silhouette scores. Visualized insights through data-driven dashboards and deployed the model using Streamlit for interactive exploration.