Summary
Overview
Work History
Education
Skills
Projects
Certification
Accomplishments
Languages
Affiliations
Work Availability
Timeline
OperationsManager
Souradeep De

Souradeep De

Data Scientist
Kolkata,West Bengal/ Kolkata

Summary

Data Scientist with more than 3 years of experience over Predictive Modeling, Data Processing, Natural Language Processing, Generative AI and Deploying highly scalable Machine Learning models. Passionate about NLP & Deep Learning. Proficient in Python, Statistics, Pre-processing, Visualization, SQL Databases, and detail-oriented. Excellent communicator, deadline-driven, and adept at issue resolution.

Overview

3
3
years of professional experience
3
3
Certification

Work History

Associate Data Scientist

Indus Net Technologies
Kolkata
2021.02 - Current
  • Part of INT-Analytics team
  • Developed statistical and predictive machine learning models
  • Led development of credit scoring system, deployed on Azure VM
  • Led team in creating healthcare policy text extraction model using YOLO for 85% accuracy
  • Developed Natural Language Processing Model for customer concerns and desires identification
  • Created REST API’s for front-end integration using Fast-API and Azure Functions.

Education

B.Tech - Electronics And Communication Engineering

S.T Thomas’ College Of Engineering And Technology
Kolkata, West Bengal
07.2020

ISC - Science

M.P Birla F.H.S School
Kolkata, West Bengal
07.2016

ICSE - Science

M.P Birla F.H.S School
Kolkata, West Bengal
07.2014

Skills

  • Machine learning: Linear Regression, Logistic Regression, Naïve Bayes Classifier, k Nearest Neighbor’s Classifier, Decision Tree, RandomForest, Gradient Descent, AdaBoost, XGBoost, K-means Clustering, Ensembling Techniques, etc.
  • Text Processing: NLTK, Tokenizer, Lemmatization, Count Vectorizer, TF-IDF, Bag of Words, Word2vec, transformers, GPT-3, GPT-3.5, GPT-4.
  • Python/ML Packages: Pandas, Numpy, Scipy, Scikit-learn, Seaborn, Matplotlib
  • Time-Series Analysis: AR, MA, ARIMA & SARIMA, VAR, FbProphet.
  • Cloud Platforms/Services: Azure
  • Web framework: FastAPI, Streamlit, Azure Functions, Flask

Projects

Project 1 - SmartQGen: AI-Powered Question Generation and Auto-Answering

Roles & Responsibilities: -

  • Developed a text similarity matching model suggesting user questions based on their input.
  • Created a questionnaire database with scoring against similar sentences.
  • Trained 'distil-roberta-base' model for text similarity.
  • Achieved Spearman Correlation of 0.87 and Pearson Correlation of 0.88 after complete training.
  • Stored questionnaire database in Azure Cosmos-db.
  • Developed REST APIs using Azure Functions for question suggestions and model retraining.

Project 2 - CreditGuard: Predictive Credit Scoring System for Banks

Roles & Responsibilities: -

  • Developed predictive model for default prediction and calculated credit scores using logit regression, WOE, and IV
  • Conducted data analysis and preprocessing, handled missing values, and outliers
  • Performed feature transformation, selection, and model optimization
  • Developed credit scoring model based on odds and coefficients

Project 3 - Emotionise: AI-Driven Content Generation and Emotion Detection

Roles & Responsibilities: -

  • Developed emotion detection, question generation, and tool-tips generation NLP models using LLMs (GPT-3).
  • Cleaned and preprocessed data, configured BERT transformer
  • Trained and validated models, deployed on Azure VM

Project 4 - Blood Pressure Abnormality Prediction Model

Roles & Responsibilities: -

  • Collected essential features from domain experts
  • Cleaned and transformed data, handled multicollinearity
  • Conducted feature selection and model optimization.
  • Upon hyperparameter tuning and using, cross-validations got a test accuracy of 88%.
  • Deployed on Azure VM. Developed front-end using Streamlit.

Project 5 - Keyword Extraction and Website Industry Prediction

Roles & Responsibilities: -

  • Scraped website landing pages using Python and BeautifulSoup to extract keywords.
  • Retrieved historical metrics such as search volume and CPC for the keywords via Google Ads API.
  • Filtered and selected top 20 keywords based on metrics for further analysis.
  • Trained GPT-3.5-turbo-instruct model on extracted keywords to identify the most likely industry of the website.
  • Deployed the model on a cloud platform for real-time industry classification predictions.
  • Applied data pre-processing and model optimization techniques to enhance accuracy and efficiency.

Certification

  • The Data Science Course 2020: Complete Data Science Bootcamp - [28/09/2020].

Certificate Number : UC-6c0b3b3b-c998-4f33-8184-d10ec9fe4725

Certificate URL : http://ude.my/UC-6c0b3b3b-c998-4f33-8184- d10ec9fe4725

  • Databases and SQL for Data Science with Python by IBM - [23/03/2022]
  • Introduction to Machine Learning in Production - [09/04/2023]

Accomplishments

  • Led team to achieve an accuracy of 85% from 70% for a text extraction model, earning recognition from upper management and financial reward.
  • Improved delivery of Textual Intelligence models by using open-source, pre-trained transformers and Large Language Models, realizing overall increase in customer satisfaction and cost efficiency.
  • Consistently maintained high customer satisfaction ratings.

Languages

Bengali
First Language
English
Proficient
C2
Hindi
Proficient
C2

Affiliations

  • Member of St. Thomas' College of Engineering and Technology Alumni Association
  • Captain of St. Thomas' College of Engineering and Technology Alumni Table-Tennis team (2021-present)

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Associate Data Scientist

Indus Net Technologies
2021.02 - Current

B.Tech - Electronics And Communication Engineering

S.T Thomas’ College Of Engineering And Technology

ISC - Science

M.P Birla F.H.S School

ICSE - Science

M.P Birla F.H.S School
  • The Data Science Course 2020: Complete Data Science Bootcamp - [28/09/2020].

Certificate Number : UC-6c0b3b3b-c998-4f33-8184-d10ec9fe4725

Certificate URL : http://ude.my/UC-6c0b3b3b-c998-4f33-8184- d10ec9fe4725

  • Databases and SQL for Data Science with Python by IBM - [23/03/2022]
  • Introduction to Machine Learning in Production - [09/04/2023]
Souradeep DeData Scientist