Summary
Overview
Work History
Education
Skills
Timeline
Generic

VIVEK SINGH

Data Scientist
IIT Kharagpur

Summary

Grew up @ IIT Kharagpur, solving real-time problems is what fills my dopamine void. I'm sharply skilled in Data Science, Analytics, Software development and end-to-end deployment. I'm currently working for a in the Quick Commerce industry solving Demand Forecasting and replenishmnet problems using Data Science.

I've interned in top Mnc's such as Mahindra Groups, Yatra. com, and some really cool startups like LoveLocal, which has helped me to grow my technical skillset and made me an excellent real-time problem solver, along with this it has also helped me to develop a deep sense of responsibility towards my work. I've dealt with the problems using Data Science from a diverse set of domains ranging from E-commerce, Retail, travel-hospitality, marketing, heath-care, to Finance, which has also taught me how businesses in various industries work.

Overview

2
2
years of professional experience
7
7
years of post-secondary education

Work History

Data Scientist

Blinkit
02.2022 - Current

Project 1: Demand Forecasting Engine using ML:

  • Built an end-to-end demand forecasting engine using ML scaled to ~34 stores(pan India), responsible for replenishing ~11k unique items(~ 0.1 Mn unique time-series) from the warehouse to dark-stores, leading to significant automation of the process
  • Prepared the data train, val and inference datasets for the 34 dark stores, with overall ~1 lacks unique time-series with 92 features
  • Implemented a splitting logic with training incorporating last 5 months and validation and inference data incorporating the 7 days
  • Created a logic to segregate the high-selling items and the low-selling items based on the mean quantity sold slot for the items
  • Formulated an accuracy metric based on business logic, ie, either percent diff should be between 80-120, or value diff less than 1
  • Implemented Facebook-Prophet for ~5k time-series and improved the overall accuracy by 25% and savings on dump by 5cr per week
  • Tuned hyperparameters prior_scales and fourier orders) of prophet using optuna leading to further ~10% improvement in accuracy
  • Experimented with growth(logistic) and trend parameters of prophet, along with log transformation and improved the model accuracy
  • Implemented the LGBM and Catboost model for the ~1 lacs, low-selling time series and improved the overall model accuracy by 15%
  • Improved item availability logic by adding weights for each item at the city level, which improved the availability and decreased dump

Project 2: INSIDER-Residual Analysis Tool

  • Built a tool to identify time-series on which model performs poor, along with the reasons if there’s noise in data or problem in model
  • Applied statistical tests and domain logics to tag the high residuals and a rolling mean-based approach to get reasons for poor forecasts
  • Implemented a logic based on residual analysis to select the model which gives the best forecasts for each ~5k time series
  • Automated the process along with the report generation having interactive plots and metric values, tracking models with residual analysis

Data Scientist

CRED
10.2021 - 01.2022

Eagle-Anomaly Detection Tool:

  • Built an end-to-end time-series based Anomaly detection tool, which tags the anomalous points based on several time series factors
  • Implemented statistical logics to find the contribution of the feature categories leading to the anomaly on a specific day
  • Formulated data distribution-based logic for deviation in feature-category and its contribution to the deviation in the target-metric
  • Incorporated the weights to each feature category based on which category changes correlate maximum to the target metric
  • Dashboarded the results for monitoring the metrics and displayed each anomalous point along with its reasons in the priority manner

Data Science Intern

LoveLocal
06.2021 - 08.2021

Project 1: Product recommendation system and market basket analysis

  • Built a product Recommendation system using User-item( interaction between the customer to the product) and item-feature(featuresof the items) based sparse coo matrix(using scipy) for hybrid collaborative filtering using LightFM library with an AUROC value of 85%
  • Obtained frequently bought together products combination(~1000 optimum association rules) using ARM over a market basket dataset
  • Designed custom dictionary to correct product names using the Levenshtein distance and uesd inflexion point analysis to get threshold
  • Project 2: Binary Classification Model For Uninstallation Of The Application:
  • Implemented a binary classification model to identify the retailers who are likely to uninstall the LoveLocal retail application
  • Clustered the stores using k-means clustering with a haversine distance matrix to fetch the centroid data from Google-places API
  • Implemented RF classifier and XGBoost on an imbalanced dataset with an AUROC value of 87%, and the model went into production

Data Science Intern

Mahindra Groups
05.2021 - 06.2021
  • Created Propensity model via supervised learning techniques to create target groups
  • Created propensity model via supervised learning techniques, used to create target groups for cross-selling marketing campaigns
  • Selected optimal features, trained the model and best performance (Recall:84 &AUROC:85) is achieved by, weighted XG-Boost algorithm
  • Calculated the true churn rate per group, and the top decile contains 10% of the population which is most likely to buy SCV Cargo

Software Developer Intern

Yatra.com
01.2021 - 04.2021

Yatra support email classifier bot:

  • Implemented the multi-class text classification which categorized customer’s query into 5 classes (ticket cancellation, refund etc)
  • Applied LDA(genism) to cross verify the miss-classified labels for the email body and corrected the labels if not in top 5 optimum words
  • Cleaned the email body followed by basic pre-processing, tokenization, stop words removal and lemmatization using spacy and genism
  • Vectorized the text data using Tf-Idf, selected optimal features using Chi2, and trained the model using random forest classifier
  • Achieved the AUROC value of 82% (macro average), dockerized and then deployed the text classification application using Flask

Education

Bachelor of Science - Geophysics

IIT Kharagpur
Kharagpur
07.2017 - 07.2021

Master of Science - Geophysics

IIT Kharagpur
Kharagpur
07.2021 - 07.2022

High School Diploma -

Green Wood School
India
06.2015 - 06.2017

Skills

Python

undefined

Timeline

Data Scientist

Blinkit
02.2022 - Current

Data Scientist

CRED
10.2021 - 01.2022

Master of Science - Geophysics

IIT Kharagpur
07.2021 - 07.2022

Data Science Intern

LoveLocal
06.2021 - 08.2021

Data Science Intern

Mahindra Groups
05.2021 - 06.2021

Software Developer Intern

Yatra.com
01.2021 - 04.2021

Bachelor of Science - Geophysics

IIT Kharagpur
07.2017 - 07.2021

High School Diploma -

Green Wood School
06.2015 - 06.2017
VIVEK SINGHData Scientist