Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
INTERSHIP
Timeline
Generic

Aditya Joshi

Senior Data Scientist

Summary

Data Scientist | 5.5+ years of experience

Experienced with complex data analysis and advanced statistical techniques to uncover actionable insights. Utilizes machine learning models to solve real-world problems and enhance business outcomes. Knowledge of Python, R, and data visualization tools to communicate findings effectively.

  • Core expertise in Machine Learning, Deep Learning (NLP) , Data Analytics , MySQL and Python
  • Built and deployed deep learning models using TensorFlow and Keras, with strong application in Prediction analysis.
  • Developed production-ready ML solutions across Petrochemical and Manufacturing domains
  • Proficient in Python programming.
  • Secondary skillset in Generative AI, including hands-on experience with LangChain, LangGraph, RAG, and Agentic-RAG applications using LLaMA and OpenAI models.
  • Technical skills : Generative AI (LLMs), LangChain, Deep Learning ,Clustering , Machine Learning, Python, MySQL.

Overview

8
8
years of professional experience
16
16
years of post-secondary education

Work History

Data Scientist

Wipro Technologies
04.2022 - Current

Project 1: Knowledge Management using Generative AI (Phase 1 RAG)

  • Designed and implemented a RAG-based system using Langchain, Pinecone, and Hugging Face embeddings, OpenAI embeddings with OpenAI, and Llama3 to provide advanced contextual question-answering over large document corpora.
  • Developed an embedding pipeline using OpenAI embeddings to convert documents into vector embeddings, stored in Pinecone for fast retrieval of relevant information.
  • Evaluation: Conducted a thorough evaluation of the system's performance using standard metrics (e.g., Precision, Recall, F1-score) to measure the relevance and quality of generated responses.


Project 2: Hazard Prediction System Monitoring Tool for Petrochemical Affiliates.

  • The Hazard Prediction System Monitoring Tool is an advanced, proactive solution designed to classify and predict potential hazards within petrochemical operations.
  • By leveraging Named Entity Recognition (NER) with the Spacy Large Model and advanced Machine Learning (ML) techniques, such as XGBoost and Random Forest, the system integrates multiple data sources to assess and predict risks related to key performance indicators (KPIs).
  • Spacy's large model was trained on custom data to extract custom entities (HAZOP and HRA).
  • The same base model was then used to extract custom entities from data sources like Eshems and Incident data.
  • A similarity model was applied to extracted entities to fetch the RAW incident descriptions to showcase them on the dashboard.


Project 3: NOx Forecasting for the Petrochemical Plant.

  • Developed a machine learning model using the Random Forest Regressor to forecast NOx emissions for the next 7 days, based on real-time sensor data.
  • Evaluated model performance with MAE and RMSE, and optimized hyperparameters to achieve better forecasting accuracy.
  • Created a forecasting model that helped the end users with real-time NOx emissions prediction, enabling proactive decision-making for environmental compliance and operational optimization.


Project 4: Asset Healthcare for Petrochemical Plant

  • Developed an autoencoder model for detecting anomalies in time-series data, such as sensor readings from the petrochemical plant (assets).
  • A total of five affiliates, constituting fourteen plants, had their data collected, and the data was preprocessed.
  • Optimized the model through cross-validation, hyperparameter tuning using Hyperopt, model tracking with MLOPS, and regularization techniques to ensure robust performance across diverse data sets.

Associate Data Analyst

Mindtree Ltd
01.2019 - 02.2021

Project 1: Churn Analysis for MDOS Clients - MDOS – Microsoft Digital Operation System. Client Types.

  • Centralized OEM inventory, decentralized inventory, factory floors key inventory, and subsidiary inventory are all spread across 97 demographic regions.
  • The business problem was to study the behavior of the clients based on the data available, and predict which customer is highly likely to churn from the MDOS Cloud web application and WPF application for different demographic regions.
  • Worked on data processing, data cleaning, quantitative and qualitative data analysis using statistical tests, correlations, and data visualizations.
  • As a part of the Data Science team, I was involved in solving the client's problem using available data, building predictive models, discovering insights, and identifying trends and patterns with data.
  • Built end-to-end predictive logistic regression and random forest models, understanding business problems, data pre-processing, feature engineering and selection, model building, evaluation, and data insights.
  • Performed data cleaning, feature scaling, and feature engineering using the Pandas and NumPy packages in Python.
  • Recommended ways in which the churn rate could be reduced include building clusters and segments, and providing different strategies to retain the most wanted customers. Turn insightful data into concrete action by giving different offers and deals to retain loyal customers, and by addressing the major reasons for customer churn, etc.


Project 2: NLP: Sentiment analysis for OEM and factory floor inventories for digital attach and store product reviews/feedback.

  • The business problem was to study the remarks/reviews of the survey form for the digital attaches sold across different Microsoft partners/clients.
  • Build a sentiment model that helps businesses identify customer sentiment toward products and their services.
  • Worked on preprocessing the data, stop words removal, lemmatization, stemming, and cleaning the text data using various NLP techniques, and built a word cloud to identify the subjectivity and polarity of the data.
  • Provide positive or negative sentiment for each of the given sentences.
  • The Early Stopping technique was used to avoid model overfitting.
  • Ensured that the model has a low false positive rate by iterating the model through various parameters.

QA Automation Engineer

Mindtree Ltd
12.2017 - 11.2018
  • Writing and executing the test scripts with Selenium Web Driver framework.
  • Involved in Manual, Functional and Automation Testing of Cloud Web Application and WPF Application.
  • Understand the user story and Identify the Test Case Scenarios
  • Writing and executing the test scripts with C# Selenium WebDriver and execution report with Nunit Framework
  • Strong knowledge on Defect life cycle and Agile Methodology
  • Hands-on Knowledge on OOPS Concepts and MySQL Database.
  • Documenting, and Tracking using Microsoft Azure Test Plans.

Education

Bachelors of Engineering - Computer Science Engineering

The National Institute of Engineering
Mysuru, Karnataka
04.2001 - 07.2017

Skills

  • Generative AI : LLMs ( Llama3*, OpenAI), Langchain
  • Analytical Tools : Python
  • Machine Learning : Linear Regression, Logistic Regression, Random Forest, Boosting Techniques and Clustering
  • Deep Learning : ANN, RNN, LSTM, Transformer, BERT and NLP
  • Packages : pandas, numpy, matplotlib, sklearn, Tensorflow Keras
  • Visualization : Matplotlib Python
  • IDE : Jupyter Notebook , VScode

Accomplishments

  • Bagged ‘Unstoppable' award for Quarter Jan-March by the team Zyphyr for working consistently and delivering high quality model.
  • Awarded as ‘Team Player' by the TechHack team Team for the Year 2019.

INTERSHIP

  • Profound AI
  • Data Science Intern
  • Bangalore
  • Sep-Oct 2017

Timeline

Data Scientist

Wipro Technologies
04.2022 - Current

Associate Data Analyst

Mindtree Ltd
01.2019 - 02.2021

QA Automation Engineer

Mindtree Ltd
12.2017 - 11.2018

Bachelors of Engineering - Computer Science Engineering

The National Institute of Engineering
04.2001 - 07.2017
Aditya JoshiSenior Data Scientist