Summary
Overview
Work History
Education
Skills
TECHNICAL SKILLS
Publications
ABSTRACT
TRAINING PROGRAMME
PRESENTATIONS
AWARDS
Timeline
Generic

Sandip Santra

Bengaluru

Summary

Data scientist with 9+ years of experience in interpreting and analyze data for driving business solutions. Proficient knowledge is statistics, Machine learning ,Deep learning and analytics. Excellent understanding of business operation tools for effective analyses of data.

Overview

10
10
years of professional experience

Work History

Senior Manager, Senior Data Scientist

EXL Services Private Ltd
10.2023 - Current
  • Developed a pipeline for intent recognition & intent discovery on Customer call transcript.
  • Methodology for chunking to breaking down large pieces of unstructured text into smaller segments on transcript.
  • Fine-tuning Llama-2 pretrained model for text summarization on small amount customer utterance.
  • Create a representation model for per cluster using LLM.
  • Developed a scalable approach for intent classification using NLI.

Senior Data Scientist (Research & Development)

GAVS Technologies Pvt Ltd.
01.2021 - 10.2023
  • Developed Unum stop-loss actuary model to estimate individual’s health care total cost of care (medical and pharmacy) for upcoming 12 months with 89% accuracy and 0.83 R2 Score and produce explainable AI model which help to understand and interpret for decision making in business.
  • Pathway analysis on medical code (CPT, ICD, HCPCS, Taxonomy) on insurance data.
  • Compute healthcare entity similarity score with fuzzy matching and assigned unique GLN ID to individual Healthcare provider.
  • Identification of patient who have mild cognitive impairment(MCI) due to AD (Alzheimer) or mild AD but are currently undiagnosed.
  • Developed risk factor model for onset or early detection of Alzheimer disease.
  • Extract radiology test information and their corresponding result from large unstructured radiology data.
  • Build custom NLP pipeline using machine learning and rule-based matching.
  • Joint entity estimation and relation classification on radiology data from pre-trained hugging face.
  • Mapping extract or nearest national drug code(NDC) from unstructured pharmaceutical drug data (brand , generic name and doses).

Senior Software Engineer (Data Scientist)

Marlabs Inc.
10.2019 - 12.2020
  • Developed automate process (Auto-ML) of applying machine learning to real-world problems which covers complete pipeline from raw dataset to deployable machine learning model.
  • Sentiment analysis on Twitter streaming data using Kafka and Spark in Databricks platform and Designed dashboard on twitter data to help enhance visibility and better decision making for business.
  • Discovered different type of topics that occurs in Marlabs feedback dataset.
  • Scraping unstructured text from website and finding sematic similarity between today text and tomorrow text in individual website and also finding sentence wise deltas between two text documents.
  • Supported Madvisor software development ,testing processes and maintained existing applications
  • Collaborated on future projects ,innovations and continuously identified, measured, and improved processes
  • Verified that software met requirements

Data Scientist

CenturyLink Technologies Pvt Ltd
11.2018 - 10.2019
  • Worked as Data Scientist with US based utility management firm, with role entailing development of machine learning algorithms based predictive models and statistical analyses.
  • Developed models for Steel Corrosion (non-parametric models), Wood Pole Decay (non-parametric model) that will read in data of steel towers and wood poles and call out structures that need maintenance.
  • Developed automation of ServiceNow ticketing system and predict Assignment group and priority of ticket which was created on customer request.
  • Pre-processing of text data in order to extract better features from clean data .constructed language model to accomplish Named Entity Recognition(NER).
  • Extracted correlation matrix of top 50% word which was most correlated with Assignment group and priority.
  • Build own sentiment (positive and negative) and emotion model on CenturyLink product which was review given by customer.
  • Feature generation, document clustering, text classification using Bag-of-words,Word2vec and glove
  • Designed and documented REST/HTTP APIs, including text and JSON data formats for SN incident Ticketing

Data Scientist

Smartfuture Pte Ltd
01.2018 - 08.2018
  • Design and Developed fitness and diet recommend system application for employee and organization.
  • Developed wellness score algorithm for B2C and B2B which was help to increase online sales((up to 15% per product).
  • Evaluate and process raw information such as spatial data, structure and unstructured data.
  • Build novel model , based on data mining and statistical modeling for predictive and prescriptive analytics.
  • Present and explain finding and suggest way to both increase healthcare quality and reduce cost.
  • Created wellness database to store and handle all wellness activity and medical parameter.

Associate Data Scientist (JRF & SRF)

Indian Council Of Agricultural Research
08.2014 - 07.2017
  • Developed National Animal Disease Referral Expert System(NADRES) which was forecast probability of occurrences of disease in individual district, two month advance.
  • To assess efficiency of disease control program with defined geographical area.
  • Developed analytical methods for disease mapping, cluster investigation, ecological analyses, studies of risk near point sources of environmental pollution, and surveillance.
  • Worked as Data scientist with CDC funded project with role of entailing development of machine learning model
  • Predicted risk of Aquatic Animal Diseases emergence which was reduce mortality and morbidity rate of aquatic animal and motivate 40% new farmer join into aquatic animal profession
  • Automated script for Capture and visualize spatial information all types of geographical reference data as well as digitally manipulating images from Earth's surface.
  • Analyzed and visualized spatial distribution of disease propagation using spatial analysis techniques, ESRI ArcMap Spatial Statistic tools
  • Measurement of geo-coordinates and related remote sensing , metrological(GRID) data and maintained and managed GIS data sources and implemented mapping projects
  • Developed automated R scripts for different machine learning logistic regression, random forest & support vector machine ,ensemble model stacking, bagging, boosting to improve the accuracy of the prediction.
  • Developed R Scripts for converting data to transactional data, tuning support, Confidence and to lift parameters to generate best associative rules.

Education

Master of Computer Applications - Computer And Information Sciences

R V College of Engineering (VTU)
Bengaluru
09.2014

Bachelor of Computer Applications -

Apex College (Sikkim Manipal University)
Kolkata
07.2010

Skills

  • Machine Learning
  • Deep Learning
  • Natural Language processing
  • LLM
  • Model Development
  • Data Visualization
  • Web Scrapping

TECHNICAL SKILLS

  • Tools: Python, R, Pyspark, Tensorflow, Keras, Pytorch, Snowflake, Weka, JIRA, GITHUB
  • Packages: Pandas, Numpy, Scipy, R-Shiny, Scikit-Learn, H2o, NLTK, Gensim, Spacy, Fasttext, Glove
  • Cloud Service: Azure Databricks, AWS Sagemaker, Azure, AWS
  • Cognitive Sciences: Azure text analytics, IBM watson
  • Machine learning algorithm: Linear & Logistic regression, SVM, Naive-bayes, Random forest, XGBoost, LightGBM, CatBoost, LDA, PCA
  • Clustering algortihm: K-means, Hierarchical clustering, DBSCAN
  • Deep Learning algorithm and platform: RNN, LSTM, GRU, DNN,CNN
  • Word and sentence embedding: ELMO, BERT, GPT, USE, Word2Vec
  • Database: Mysql, PostgreSql, Azure Cosmosdb
  • Data Visualization: Matplotlib, Seaborn, Tableau
  • Web scraping: Beautifulsoup, Scrapy
  • GIS and Remote sensing tools: ArcGIS, ERDAS imagine

Publications

  • Development of a Framework for Data Management in Mobile Location Based Service
  • Streptococcus uberis ST 439 and ST 475 Induce Differential Inflammatory Responses in a Mouse Intramammary Infection Model
  • Comparative genome analysis of short sequence repeats in pathogenic and non pathogenic leptospira- a statistical approach

ABSTRACT

  • Kavya B.A., Suresh K.P., Gajendragad M.R., Reshma K, Sandip Santra, Patil S.S, Parimal Roy “A Climate-Anthrax Relationship Model : A Case Study in Karnataka”, at the XIII Agricultural Science Congress, Bengaluru, Karnataka, India , February 21-24, 2017.
  • Suma A.P., Suresh K.P., Sandip Santra, Gajendragad M.R “Remote Sensing Technologies for Monitoring the Climate Change in Karnataka :A Pilot Study” at XIII Agricultural Science Congress, Bengaluru, Karnataka, India , February 21-24, 2017.
  • Suresh KP, Sandip S, Reshma K, Gajendragad MR, Manjunatha Reddy GB and Parimal Roy “National Aquatic Animal Disease Database: A Dynamic Web Application” International Symposium on Aquatic Animal Health and Epidemiology for Sustainable Asian Aquaculture at ICAR-NBFGR, Lucknow, India, April 20-21, 2017

TRAINING PROGRAMME

  • Statistical Analysis System (SAS)” Organized at National Institute of Veterinary Epidemiology and Disease Informatics (ICAR-NIVEDI)
  • Training program on Research Methodology, Epidemiology and Biostatistics” Organized at National Institute of Veterinary Epidemiology and Disease Informatics (ICAR-NIVEDI)

PRESENTATIONS

  • “Development of a Framework for Data Management in Mobile Location Based Service” COMSAP 2014, 28th June , 2014 at RVCE in Bangalore
  • “Vector Space Model and Inverted Indexing Strategy for Content Based Text Document Retrieval” Symposium on Academic Mini-Projects on 29th March, 2014 at RVCE in Bangalore
  • “Detection of Smoke Intensity from the satellite Image using Markov Model” Symposium on Extra Mural Mini- project-2013 on 24th August, 2013 at RVCE in Bangalore
  • “Remote Sensing Technologies for Monitoring the Climate Change in Karnataka :A Pilot Study”. Poster presentation at the XIII Agricultural Science Congress, Bengaluru, Karnataka, India , February 21, 2017.

AWARDS

  • Best posters presentation award in XIII Agricultural Science Congress, Bengaluru, Karnataka, India , February 21-24, 2017
  • Star performer award in 2020.

Timeline

Senior Manager, Senior Data Scientist

EXL Services Private Ltd
10.2023 - Current

Senior Data Scientist (Research & Development)

GAVS Technologies Pvt Ltd.
01.2021 - 10.2023

Senior Software Engineer (Data Scientist)

Marlabs Inc.
10.2019 - 12.2020

Data Scientist

CenturyLink Technologies Pvt Ltd
11.2018 - 10.2019

Data Scientist

Smartfuture Pte Ltd
01.2018 - 08.2018

Associate Data Scientist (JRF & SRF)

Indian Council Of Agricultural Research
08.2014 - 07.2017

Master of Computer Applications - Computer And Information Sciences

R V College of Engineering (VTU)

Bachelor of Computer Applications -

Apex College (Sikkim Manipal University)
Sandip Santra