
Results-driven Data Scientist with 2 years of experience in analyzing complex datasets and building predictive models. Proficient in machine learning, neural networks, and Natural Language Processing (NLP) to deliver actionable insights. Skilled in statistical analysis and data visualization to enhance data-driven decision-making. Committed to leveraging data science for solving business challenges and improving operational efficiency.
Air Pressure System (APS) fault detection
• Objective: developed a machine learning solution to detect component failures in the Air Pressure System (APS) of trucks to minimize unnecessary repairs and reduce maintenance costs
•Built and Data Preprocessing: Handled missing values by implementing various imputation techniques (KNN, mean and median) and addressed data imbalance using SMOTE.
•Model Training and Evaluation: Trained multiple classification models including Random Forest, Decision Tree, SVM and XGBoost.
•Performance Metrics: Evaluated models and selected the XGBoost model with constant imputation as the final model, achieving a 95% accuracy rate in manual validation.
•Tools and Libraries: Utilized Python, Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib, Imbalanced learn, and XGBoost.
•Impact: Successfully reduced the cost due to unnecessary repairs by accurately identifying APS component failures, thereby improving the reliability and efficiency of the truck maintenance system
Chest X-ray disease detection using InceptionV3 and Streamlit
• Designed and implemented a deep learning pipeline using InceptionV3 transfer learning to classify chest X-ray images into 4 categories: COVID-19, Bacterial Pneumonia, Viral Pneumonia, and Normal.
•Trained model on 8,000+ labelled images, achieving 92%+ validation accuracy using categorical cross entropy and Adam optimizer over 10 epochs.
•Employed data augmentation, dropout (0.5), and batch normalization to improve generalization and reduce overfitting by ~25%.
•Developed a responsive frontend application using Streamlit, enabling users to upload chest X-rays and receive real-time predictions with <2 seconds inference time.
•Implemented full image preprocessing pipeline (resizing to 299x299, normalization, tensor expansion) ensuring 100% compatibility with the trained model.
•Displayed user-friendly outputs including prediction label and confidence score to aid in interpretability for non-technical users.
•Enabled deployment-ready application with minimal hardware requirements for use in medical diagnostic assistance tools or clinical PoCs.
•Tools & Tech Stack: Python, TensorFlow/Keras, Streamlit, NumPy, PIL, Matplotlib, ImageDataGenerator.