Holding a Ph.D in Statistics with a focus on Financial Time Series Analysis and Forecasting from Mangalore University. Currently serving as an Assistant Professor at St. Joseph’s University, coordinating the Data Science and Analytics certificate course. Recently completed a Post-Graduate program in Data Science, specializing in Machine Learning and Artificial Intelligence, at Imarticus Bangalore.
Throughout a distinguished career, numerous M.Sc. projects have been supervised in areas such as Hypothesis Testing, Regression Analysis, CNN, RNN, NLP, ML, and Time Series Analysis. As a data analyst for the NIRF cell at St. Joseph's College of Commerce Bangalore, contributions led to improving the college's national ranking from 92nd to 65th.
Achieved first rank in M.Sc. Statistics from Mangalore University and published research papers in prestigious journals indexed in Scopus and Web of Science. Research work has been published by esteemed publishers like Springer and Taylor & Francis, and received the "Dr. B.K. Kale Award for Best Research Paper" at a national level competition. Additionally, cleared the Karnataka State Eligibility Test (KSET) in Mathematical Science.
In addition to academic achievements, several projects in Machine Learning and Deep Learning have been completed, demonstrating practical expertise in these areas.
Project Outcomes: The project aimed to identify factors associated with a higher likelihood of diabetes in Pima Indian women by using various machine learning models. The dataset included characteristics such as glucose levels, BMI, age, and the number of pregnancies, collected by the US National Institute of Diabetes, Digestive and Kidney Diseases. Missing values for insulin, skin thickness, blood pressure, and BMI were handled through removal or imputation to ensure data integrity. Logistic regression with backward elimination, selecting 'npreg', 'ped', 'glu', and 'bmi' as significant features, achieved the best performance with an accuracy of 78.89% and an AUC of 0.8648. Other models, including SGDC, Decision Tree, Support Vector Machine, and Naive Bayes, were evaluated but found to be less effective. For predicting diabetes pedigree function, a multiple linear regression model with K-best feature selection outperformed other regression approaches, with k-Fold cross-validation confirming the robustness of the K-best features model with a mean CV score of -0.2559. The study concludes that glucose level, BMI, number of pregnancies, and age are key predictors of diabetes, suggesting targeted interventions for at-risk sub-groups.
Project Outcomes: The project successfully leveraged Tableau to comprehensively analyze global sales performance and market trends. Insights were gained into sales performance across products, segments, countries, and discount levels, aiding strategic decision-making. Key findings included identifying top-performing products, profitable markets, growth opportunities, and areas for improvement. The analysis also compared market share with competitors and highlighted regions with high sales potential. Visualizations such as histograms, pie charts, and bubble plots were utilized to present data effectively. Finally, interactive dashboards were created to consolidate visualizations and enable dynamic exploration of sales, profitability, trends, and geographic analysis.
Project Outcomes: In our study, we used various models to analyze data and found that age is a significant factor in osteoporosis, with females having a higher risk. Our gradient-boosting model achieved 92.01% accuracy in predicting osteoporosis risk. Additionally, we evaluated CNN architectures and introduced ConvNetXtiny, which achieved the highest accuracy of 93.2% in diagnosing osteoporosis from knee X-ray images, surpassing other CNN models. This suggests ConvNetXtiny's potential as a cost-effective diagnostic tool. Our findings highlight the importance of advanced CNN models in medical imaging for accurate osteoporosis diagnosis, potentially streamlining healthcare processes. Future work involves collecting more data, exploring relationships with other osteoporosis sites, and developing a combined clinical-imaging diagnostic system. Ultimately, these efforts aim to benefit patients and healthcare providers by enhancing osteoporosis detection and management.
Project Outcomes: The project aimed to predict benign or malignant breast cancer diagnoses using Random Forest, Bagging Meta-estimator, AdaBoost, and XGBM models. Random Forest achieved the highest accuracy at 92%, demonstrating its effectiveness in distinguishing cancer types. This underscores the potential of machine learning models in improving diagnostic accuracy for breast cancer detection. The findings emphasize the importance of data science and machine learning in medical diagnostics. These advancements are crucial for better understanding and managing complex medical data.
Project Outcomes: Using K-Means clustering, this project segmented cricket players based on various performance metrics, finding two optimal clusters: bowlers and batsmen. Bowlers had higher wickets but lower scores and averages, while batsmen excelled in scoring runs and hitting sixes. The silhouette-score method confirmed the validity of these clusters. This approach aids in understanding players' distinct skills, enhancing strategic decision-making in team selection and development. The segmentation provides valuable insights into the diverse skill sets of cricket players.
Project Outcomes: The project addressed customer churn in the telecommunications sector by analyzing customer data with models like KNN, Random Forest, XGBoost, and AdaBoost. KNN with K=31 and Manhattan distance performed best, identifying high-risk churn customers. Key indicators of churn included Total Charges, Tenure, and Monthly Charges, accounting for 54% of churn likelihood. The findings emphasize the importance of managing these factors to improve customer retention. Machine learning models offer strategic insights for retaining customers in a competitive market.
Project Outcomes: This project predicted Titanic passenger survival using Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN) models. ANN demonstrated superior performance among the models. Exploratory Data Analysis revealed higher survival rates for Class 1 passengers, females, aged individuals, and children. The study successfully applied both machine learning and deep learning models to predict survival outcomes. It also identified significant survival patterns, enhancing understanding of the Titanic dataset.
Project Outcomes: The project aimed to predict university admission likelihood using Logistic Regression and Decision Tree models on a dataset with variables like GRE Scores, University Rating, TOEFL Scores, and more. Logistic Regression achieved a 92% accuracy and a Kappa Score of 81%, outperforming the Decision Tree model. The study highlights the benefit of using simpler models when they provide high accuracy. This approach avoids unnecessary complexity and mitigates the risk of overfitting. The findings underscore the importance of model selection based on performance rather than complexity.