Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Projects
Languages
Timeline
Generic

KASHISH BHAGAT

New Delhi

Summary

Innovative Senior Data Scientist specializing in generative AI and machine learning. Engineered a voice-enabled AI platform, significantly enhancing training efficiency. Expertise in prompt engineering and scalable architectures, focused on solving complex problems and optimizing AI systems. Proficient in machine learning, predictive analytics, and big data processing, effectively translating data into actionable insights to drive project success.

Overview

4
4
years of professional experience

Work History

Senior Data Scientist

ZS Associates
New Delhi
03.2022 - Current

• Architected Simulate.AI, a voice-enabled AI simulation platform (FastAPI + OpenAI Realtime API) serving 1,000+ pharma reps — real-time WebSocket audio streaming, 15+ AI doctor personas, and sub-second latency, reducing training setup time by 40%

• Engineered a custom voice pipeline (PCM16 → VAD silence detection → Whisper STT → GPT-4o-mini → TTS) with async WebSocket orchestration and streaming audio playback, achieving 1.5–3.5s end-to-end latency

• Designed a 6-agent Automated Model Migration Framework to evaluate and refine 500+ LLM prompts during model deprecations — LLM-as-Judge scoring, pharma compliance checks, and auto-rollback, reducing migration cycles from weeks to days

• Integrated Simulate.AI into Microsoft Teams via Azure Communication Services, bridging real-time media streams to the existing OpenAI Realtime API backend for in-call AI roleplay

• Built Coach.AI Hub, an agentic hybrid RAG system (LangGraph + FAISS + BM25 + cross-encoder reranking) with a Planner Agent, query decomposition, and LLM-as-Judge validation — replacing naive RAG and eliminating hallucination issues

• Built data enrichment engine using multi-source scraping (LinkedIn/Wikipedia/Bing) with GPT-based entity normalization, improving lead accuracy by 35% across multilingual pharma datasets

• Implemented async data pipelines on AWS EKS with optimized connection pooling and structured logging, achieving 50% throughput improvement for large-scale conversational data processing

• Developed PharmRebalance (ZS Hackathon 2026) — a multi-agent inventory rebalancing system with LP optimization, FEFO compliance, safety stock calibration, and digital twin simulation for service-level and patient outcome impact

• Engineered prompt framework achieving 92% consistency in LLM-based coaching quality assessments across 5 dimensions using systematic optimization and few-shot learning

MLE-Bench Code Debugging

Turing (Contract — for Meta)
Remote
10.2025 - 02.2026

• Systematically debugged and repaired ML codebases across 30+ OpenAI MLE-Bench Kaggle competitions spanning computer vision (image classification, object detection, medical imaging), NLP (text normalization, essay scoring, code understanding), and tabular ML — following strict minimal-change protocols to isolate and fix data pipeline errors, model architecture mismatches, and training loop failures

• Resolved bugs across diverse frameworks (PyTorch, TensorFlow, scikit-learn, XGBoost) including NumPy/CUDA compatibility issues, audio preprocessing shape mismatches, DICOM medical image loading failures, and 3D point cloud parsing errors — validating each fix end-to-end via the MLE-Bench grading pipeline on GCP VMs

Education

B.Tech - Computer Science

The Northcap University
Gurgaon
05-2022

Skills

  • Machine Learning
  • Deep Learning
  • LLMs
  • Generative AI
  • Multi-modal AI
  • Reinforcement learning
  • Model Fine-Tuning
  • Model optimization
  • Model evaluation
  • Transfer learning
  • Prompt Engineering
  • RAG
  • Agentic AI Systems
  • Transformer Models
  • Text-to-Speech (TTS)
  • Speech-to-Text (STT)
  • FastAPI
  • Flask
  • PyTorch
  • TensorFlow
  • LangChain
  • Llama Index
  • Vector Databases
  • SQL
  • PySpark
  • MLOps
  • LLM Ops
  • Model Deployment
  • Model Monitoring
  • Scalable systems
  • Statistical models
  • Recommendation systems
  • Anomaly detection
  • Natural language
  • Scikit-learn
  • Machine learning
  • Deep learning
  • Transfer learning
  • Scikit-learn

Accomplishments

  • ZS Associates Case Challenge 2021, Runners-Up in ZS Associates Case Challenge 2021. Solved Pulse Beverages case study.
  • EY GDS Hack pions 3.0, Winner of EY GDS Hack pions 3.0. Created a CV Screening Solution.
  • EY GDS Hack Pions 2.0, Second Runner's Up in EY GDS Hack Pions 2.0 (#ReshapeTheFuture).

Projects

  • Virtual Sales Rep Trainer, 2024-01-01, Present, Remote, VRT is an AI training platform aiding pharmaceutical sales reps in honing communication skills via interactive simulations. Architected a Copilot-like coaching assistant using LangChain orchestration and RAG for context-aware guidance. Implemented real-time conversation analysis with AWS Transcribe speaker diarization and OpenAI Realtime API. Deployed microservices architecture handling over 10,000 daily API calls with sub-200ms latency. Built a feedback loop system integrating user interactions for continuous model improvement.
  • Intel Information Risk Analyser, 03/01/22, 06/01/22, Remote, Intel Information Risk Analyser project which includes data analytics and presentation of risk profiles. Improved data processing efficiency by 30% through webapp development for PDF file uploads. Built interactive dashboard analyzing country risk profiles across risk categories with early warning signals and trend analysis. Implemented ML-based recommendation system for risk mitigation strategies.

Languages

  • ENGLISH, Native
  • FRENCH, Advanced
  • HINDI, Native

Timeline

MLE-Bench Code Debugging

Turing (Contract — for Meta)
10.2025 - 02.2026

Senior Data Scientist

ZS Associates
03.2022 - Current

B.Tech - Computer Science

The Northcap University
KASHISH BHAGAT