Summary
Overview
Work History
Education
Skills
Timeline
Generic

Sourabh Desai

Data Scientist
Pune

Summary

Experienced Data Scientist proficient in Machine Learning, Deep Learning, Convolutional Neural Networks (CNN), Natural Language Processing (NLP), and Large Language Models (LLMs). Skilled in deploying models on cloud platforms with three years of hands-on experience in driving data-driven insights and developing innovative solutions to complex problems.

Overview

3
3
years of professional experience
5
5
years of post-secondary education

Work History

Data Scientist

PRGX Global Inc.
11.2024 - Current

Micro-Service for Document Deduplication

  • Developed a microservice for document deduplication using Locality-Sensitive Hashing (LSH) to efficiently identify duplicate records.
  • Integrated MongoDB and Azure Blob Storage to fetch and process records for deduplication.
  • Designed and implemented asynchronous REST APIs to provide real-time document duplication responses.
  • Coordinated with the Software Engineering and Platform Engineering teams to ensure seamless integration and deployment.
  • Evaluated the deduplication service, achieving a 94% recall, ensuring high accuracy in identifying duplicate documents and minimizing false negatives.
  • Deployed the microservice on Rancher, developing Helm charts and configurations, and integrated CI/CD pipelines for automated deployment.
  • Tested the solution across multiple environments, including Dev, QA, UAT, and Prod, ensuring stability and performance.

Document AI Extraction

  • Built and optimized Doc-AI pipelines to extract structured information from PDFs and Excel files based on auditors requirements.
  • Automated data ingestion workflows to store extracted insights into an SQL database for further analysis.
  • Developed regex-based extraction techniques for parsing key fields from unstructured PDFs, improving data accuracy and consistency.
  • Built an automated Excel extraction pipeline, handling multiple formats, applying preprocessing techniques, and standardizing data for downstream processing.
  • Enhanced document extraction using LLM models, improving text parsing, entity recognition, and context understanding for complex documents.
  • Optimized and debugged Python-based workflows, ensuring efficient data extraction, error handling, and performance improvements in processing pipelines.

Data Scientist

IFS India Mercantile Pvt. Ltd.
03.2022 - 11.2024

LLM – Powered Research analyst Q&A Chabot

  • Designed and developed an AI-powered research assistant by collaborating with the client to define process flows and optimize information retrieval.
  • Implemented LangChain’s UnstructuredUrlLoaders to fetch online data and utilized recursive text splitters to structure the extracted content efficiently.
  • Leveraged OpenAI and Claude (Anthropic) embeddings with Pinecone for vector-based semantic search, enabling accurate and efficient document retrieval.
  • Integrated OpenAI’s and Claude’s LLMs with a retrieval system to generate contextually rich and comprehensive responses.


E – Commerce Sentiment Analysis system

  • Developed an NLP-driven sentiment analysis model by collaborating with the client to understand and preprocess customer review data.
  • Implemented a text preprocessing pipeline including stemming, stop-word removal, tokenization, and padding to enhance data quality for modeling.
  • Built and trained a Bidirectional LSTM model using Word2Vec embeddings, achieving an 85% F1-score, a 12% improvement over the baseline.

Education

Bachelor of Engineering -

Sharad Institute of Technology
06.2009 - 06.2014

Diploma - undefined

Maharashtra State Board of Technical Education

10th - undefined

Maharashtra State Board

Skills

Machine Learning

Convolutional Neural Networks (CNN)

Deep Learning

Natural Language Processing (NLP)

Cloud Deployment (Azure, AWS)

LLM, GenAI

Micro-service Development

API Development

Problem Solving

Python

TensorFlow

Scikit-learn

SQL, Mongo DB

Data Visualization

Natural language processing

Neural networks

Timeline

Data Scientist

PRGX Global Inc.
11.2024 - Current

Data Scientist

IFS India Mercantile Pvt. Ltd.
03.2022 - 11.2024

Bachelor of Engineering -

Sharad Institute of Technology
06.2009 - 06.2014

Diploma - undefined

Maharashtra State Board of Technical Education

10th - undefined

Maharashtra State Board
Sourabh DesaiData Scientist