Summary
Overview
Work History
Education
Skills
Timeline
Generic

Saipratap

Bengaluru

Summary

Senior Gen-AI & Machine Learning Data Scientist with 7+ years of hands on experience designing, training, and deploying high-impact AI systems at scale. Specialized in LLMs, Generative AI, NLP, NLU, Computer Vision, and end-to-end ML engineering across cloud and distributed environments. Expert in architecting RAG pipelines, fine-tuning LLMs, building scalable data platforms, and deploying production-grade models on AWS, Azure, and Kubernetes. Strong track record of improving system performance, optimizing data workflows, and delivering measurable business impact through advanced modeling and automation.

Overview

7
7
years of professional experience

Work History

Senior Data Scientist

Gray Radiant Data Services
09.2024 - 11.2025
  • Architected and developed an AI-powered document processing pipeline using Amazon Comprehend for NLP-based classification and Amazon Textract for data extraction.
  • Implemented S3-based storage solutions for organizing and managing document stages (Upload, Classification, Extraction).
  • Designed a real-time file status tracking UI displaying processing status, timestamps, and document categories.
  • Utilized DynamoDB for storing extracted JSON data, maintaining the results and change history with user-editable templates for low-confidence fields.
  • Implemented a dynamic confidence score validation system, ensuring high-quality data extraction and allowing users to manually review and update flagged fields.
  • Integrated NLP techniques to classify and extract relevant data fields automatically, improving processing speed and accuracy.
  • Developed comprehensive logging and auditing features, tracking all document changes and providing an easy-to-use interface for users to monitor document history.
  • Collaborated on future-proofing the system by planning for additional stages, such as data Enrichment, to further improve document processing outcomes.
  • User can generate the similar pdf’s or other documents if the CI is very less.
  • Used Streamlit to launch the application for chat boats to ask questions on generated/uploaded documents.
  • Worked on creating RAG based approach using chroma vector database and best prompt engineering techniques for better responses.
  • End client- Royal Bank of Scotland
  • Environment: Anaconda 3.6, Python, Hadoop, Hive, SQL Server, Jupyter IDE, AI/ML, NLP, XG Boost.

Data Scientist – Consultant

Consign Space Solutions
04.2024 - 08.2024
  • Developed an AI-powered system for automated categorization of medical documents and identification of reviewable records, using OCR for text extraction and LLM-based models to classify document types and extract critical fields such as date of service, provider name, and facility.
  • Utilized the Docter OCR model to accurately extract text from PDF documents, ensuring high quality data retrieval for further processing.
  • Trained a BERT-based classification model to analyze extracted text and accurately identify document categories such as affidavits, doctor notes, radiology reports, and laboratory results.
  • Trained a BERT-based classification model to identify whether the page is Reviewable or Not Reviewable.
  • For identified categories, used ChatGPT api to automatically extract specific fields like the date of service, page headers, doctor’s name, and hospital name, streamlining the information retrieval process.
  • Client-HCL Technologies

Data Scientist

Gray Radiant Data Services
11.2022 - 04.2024
  • Designed and implemented a dynamic voice Bot system for real-time customer interaction by integrating relational and non-relational databases, AI-based speech-to-text models, and asterisk call flows, enabling automated multilingual communication and response analysis.
  • Designed voice Bot Flow as per client requirement.
  • Getting Confidential Data from Relational Database and temporarily storing customer details and voice Bot flow with respect to customer language in Non-Relational Database.
  • Constructing Logic’s for playing text as audio, all these processes are dynamic and happened during run-time without delay.
  • Initiating calls to customers through asterisk and combining fixed audios, customized audios and playing it as dynamic.
  • After playing audio, it will Record & Store customer voice and convert speech to text by AI Models.
  • Analyzing customer speech through ai trained models and playing next message according to customer response.
  • All the operations like Audio number, customer response, Mapped response are stored in the Relational database for Generating Reports.
  • Designed and implemented a dynamic AI-powered voice bot system using Asterisk, speech-to-text models, and dual-database architecture, enabling real-time, multilingual customer interaction with ~80% response accuracy.
  • Client – Tiger Analytics

Data Scientist

Hiyamee
01.2022 - 09.2022
  • AML KYC document classification using AI/ML.
  • Client receives numerous amounts of KYC documents (for AML Compliance). Tedious to be classified manually and processed by their officers. Due to which their retail customer’s services in the said duration were either getting suspended temporarily which was contributing to increase in customer dis-satisfaction.
  • To address the same, they have built an ML/NLP based document classification solution.
  • End client – Deutsche Bank
  • Environment: Anaconda 3.6, Python, Hadoop, SQL Server, Jupyter IDE, AI/ML, NLP, Azure synapse, Azure ML Studio, Cosmos DB, RAG Architecture, GAN model, Azure blob storage, Open Cv.
  • Responsibilities: Data Pre-Processing, Connecting to multiple data sources, Developed models using TensorFlow, Azure ML algorithm implementation, Preparing training/Inference pipelines, Understanding & implementation of NLP, Regression model development, Deployed various applications on OpenShift with Jenkins, SonarQube.

Jr. Data Scientist

Nano Tech E-Services
11.2018 - 12.2021
  • FMCNA is into healthcare domain, where T-machines are used for blood dialysis.
  • The whole data transmission is done via MQTT protocol and special care is taken to keep the PHI data HIPPA compliant.
  • We analyzed the data in Athena and build machine learning models with SageMaker to understand the correlation between the data and find diagnosis for existing and new patients.
  • Used AWS Textract and Comprehend to extract text and data from the document repository to draw conclusions and relationships between the existing data.
  • Deployed the services on Red Hat Openshift containers using docker files as well as source to image tool.
  • Implementation of AWS – Registering thing/devices, creating & registering certificates, lambda’s, SageMaker.
  • Implemented AWS ML algorithm and adjusting hyper-parameters and fine tuning the model.
  • Data Pre-Processing: Loading of data, data cleansing and enrichment; finally applying feature scaling to the whole dataset.
  • Understanding & implementation of Natural Language Processing(NLP), Natural Language Understanding(NLU) and Natural Language Generation(NLG).
  • Regression model was developed based on various assumption and multi-variant variables.
  • AWS Textract: To extract text and data from the scanned documents.
  • AWS Comprehend: Used ML algorithms to find relationships in the extracted text.
  • RedHat OpenShift: Deployed various applications on openshift with Jenkins, Sonarcube.
  • Implemented various POC’s with technologies like Kura, Kapua, RS232 – serial port communication.
  • Prepare high level and detailed design document for the technical framework components.
  • Ensure design, development and deployment guidelines and standards are in place for compliance with architectural goals and technology standards.
  • Client: FMCNA

Education

Masters Program -

Data Science and Artificial Intelligence
05.2025

B.Tech -

Electrical & Electronics Engineering
03.2018

Skills

  • Artificial intelligence and machine learning
  • Data analytics and mining
  • Data management and databases
  • Predictive modeling techniques
  • Natural language processing
  • Computer vision applications
  • Neural networks and architectures
  • Statistical analysis methods
  • Time series analysis and forecasting
  • Classification and clustering algorithms
  • Recommender systems and anomaly detection
  • Deep learning frameworks (ANN, CNN, RNN)
  • Generative AI models (GPT, LLaMA2)
  • Cloud platforms (AWS, Azure)
  • Data visualization tools (Tableau, PowerBI)
  • Programming languages (Python, SQL)
  • Version control with Git
  • Containerization technologies (Docker, Kubernetes)

Timeline

Senior Data Scientist

Gray Radiant Data Services
09.2024 - 11.2025

Data Scientist – Consultant

Consign Space Solutions
04.2024 - 08.2024

Data Scientist

Gray Radiant Data Services
11.2022 - 04.2024

Data Scientist

Hiyamee
01.2022 - 09.2022

Jr. Data Scientist

Nano Tech E-Services
11.2018 - 12.2021

Masters Program -

Data Science and Artificial Intelligence

B.Tech -

Electrical & Electronics Engineering
Saipratap