Summary
Overview
Work History
Education
Skills
Websites
Additional Tasks Responsibilities
Open Source Contribution
Patent Contribution
Data Science Competitions
Languages
Timeline
Generic

SPARSH DUTTA

Jaipur

Summary

Highly accomplished Machine Learning Engineer with 6+ years of experience specializing in developing and deploying large-scale AI/ML solutions, including cutting-edge LLM and MLOps platforms. Proven track record of delivering substantial business value, exemplified by 75% cost savings ($325K annually) and up to $1M in revenue generation, alongside a patented feature store. Expert in optimizing deep learning models for low-latency inference on GPUs, building scalable data pipelines, and leading technical initiatives across cloud environments (AWS, Azure)

Overview

6
6
years of professional experience

Work History

Lead Machine Learning Engineer

Matilda Cloud
06.2025 - Current
  • Developed intelligent algorithms to group applications based on network topology structure, resource type for migration
  • Business Impact: Simplified cloud service migration and accelerated lift-and-shift strategies reducing operational overhead
  • Technology Stack: AWS Bedrock, CrewAI, Graph Algorithms, Claude, ES, S3, MongoDB, RabbitMQ, FastAPI
  • Domain: AI/ML Engineering

Staff Data Scientist R&D

Innovaccer HQ
India
12.2024 - 05.2025
  • Company Overview: Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.
  • Delivered an OCR solution with a latency of 300-700 ms sustaining 120 RPS on A10G GPUs
  • Business Impact: Achieved 75% cost savings, reducing annual cloud expenses from $435k GVisionAPI to $110k
  • Technology Stack: Nvidia Triton, TensorRT, ONNX, Pytorch, CUDA, Python, Sagemaker (AWS), DocTR, Locust
  • Domain: AI/ML Engineering
  • Deployed and optimized Deepgram STT service using K8s with Prometheus-based monitoring and event-driven scaling
  • Business Impact: Enabled enterprise-level STT solution, powering voice intelligence across the full product ecosystem
  • Technology Stack: Deepgram, Shell Scripting, K8s, Helm Charts, Gitlab CI/CD, Prometheus, Kibana, KEDA, Grafana
  • Domain: ML Engineering & MLOPs
  • Developed a context-driven reasoning model for domain-specific search, leveraging Multi-Agentic RAG architecture
  • Business Impact: Enhanced search accuracy and relevance, leading to improved user engagement
  • Technology Stack: Python, CrewAI, OpenAI, RAGAS, LangChain, FAISS, ChromaDB, Slack SDK
  • Domain: AI/ML Engineering & Generative AI
  • Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.

Senior Data Scientist R&D

Innovaccer HQ
India
01.2023 - 12.2024
  • Company Overview: Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.
  • Built a scalable ML platform, enabling end-to-end analytics solutions across multiple client engagements
  • Cut model deployment time to 1 week per environment while ensuring high platform availability
  • Business Impact: Deployed 10 models to production, generating $500k-$1M in revenue
  • Technology Stack: Databricks, MLFlow, Pyspark, Kedro, Airflow, GreatExpectations, Kubernetes, AWS, CI/CD
  • Domain: MLOPs & Predictive Modelling
  • Implemented novel schema mapping engine using LLM's, BERT embedding, probabilistic fuzzy matching
  • Performed IFT on LLAMA-3 8B model using 4xA10G machines using Q-LORA & FSDP techniques via SFT
  • Used extensively across the organization as a part of Innovaccer's Data Activation Platform offering
  • Business Impact: Reduced new data source integration from 2 weeks to 1 day
  • Technology Stack: Pytorch, K8s, Airflow, Python, GitLab CI/CD, Azure, AWS, OpenAI, LLAMA3
  • Domain: AI Engineering & Generative AI
  • Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.

Data Scientist R&D

Innovaccer HQ
India
09.2021 - 12.2022
  • Company Overview: Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.
  • Built an in-house feature store for creating, sharing, and discovering ML features across teams
  • Successfully filed a patent for novel DAG-based Healthcare Feature Store Engine, enhancing feature engineering processes
  • Business Impact: Increased ability to build and scale healthcare feature processing pipelines
  • Technology Stack: Python, Fugue, Pyspark, Docker, K8s, S3, Databricks DeltaLake
  • Domain: ML Engineering, Big Data & MLOPs
  • Designed models that can predict the 30-day risk of readmissions and patients at a high risk of Avoidable ED
  • Build end-to-end data science pipeline for data processing, feature engineering, model building, and tuning.
  • Model is designed to predict for a skewed dataset with 2% target prevalence rate.
  • Achieves F1 Score - 42% beating industry benchmark of 29% LACE scores
  • Business Impact: Achieved 3-5% reduction in overall readmissions with savings ranging from 50-100K USD
  • Technology Stack: InnoML, Python, Pyspark, MlFlow, Xgboost, Sklearn, Optuna, Bayesian Optimization
  • Domain: Predictive & Statistical Modelling
  • Innovaccer is a healthcare data platform that helps organizations unlock the potential of their data.

Data Science Associate Consultant

ZS Associates
Pune
07.2019 - 07.2021
  • Company Overview: ZS Associates is a global consulting firm focused on transforming the way companies operate.
  • Created Named Entity Recognition Models using BioBERT to identify attribute information in a medical text
  • Incorporated a Regex + Ontology + BERT ensemble based approach to achieve >70% precision for all attributes
  • Beat the in-place vendor benchmark by doubling prediction coverage to 50% and created robust modules to automate
  • Business Impact: Successfully delivered extraction pipeline generating $100k in revenue
  • Technology Stack: Python, BERT, Fuzzy Matching, Ontologies, Spacy, HuggingFace
  • Domain: Deep Learning & NLP
  • ZS Associates is a global consulting firm focused on transforming the way companies operate.

Education

Masters - Machine Learning and Artificial Intelligence

BITS Pilani (Work Integrated Learning Program)
Online
10.2024

Bachelors - Electronics and Communication Engineering

The LNM Institute of Information Technology
India
07.2019

Skills

  • Python
  • RAG & Chatbots
  • Vector Stores
  • GenAI & LLMs
  • Agentic AI
  • PyTorch
  • Transformers
  • CrewAI
  • Milvus
  • LangChain
  • LangGraph
  • Fastapi
  • FAISS
  • Nvidia Triton
  • K8s
  • Postgres
  • Git
  • AWS
  • Azure
  • Claude
  • N8n
  • Gitlab
  • Jenkins
  • CI/CD Pipeline
  • Databricks
  • KEDA
  • Software development
  • Machine learning

Additional Tasks Responsibilities

  • Managed AWS account operations and optimized infrastructure costs for internal development environments
  • Led internal code review processes and coordinated timely resolution of bugs across departmental products
  • Directed departmental hiring initiatives, including internship programs and candidate evaluations
  • Planned and facilitated internal training programs to promote continuous learning and skill development

Open Source Contribution

  • Feast Feature Store Library Bug Fix
  • Doctr Parseq ONNX Conversion Bug Fix

Patent Contribution

  • METHOD AND SYSTEM FOR PROVIDING FAAS BASED FEATURE LIBRARY USING DAG

Data Science Competitions

  • Analytics Hackathon - Multi Agentic RAG (Innovaccer HQ), 1st, 20, 2024, Link
  • Kaggle BMS - Molecular Translation (ZS Associates), 56th, 874, 2021, Link
  • AWS DeepRacer Reinforcement Learning Challenge (ZS Associates Onshore), 1st, 22, 2020, Link
  • Innoplexus Online Machine Learning Hackathon, 5th, 1998, 2019, Link
  • ZS Associates Young Data Scientist Challenge, 22th, 5546, 2018, Link

Languages

English
First Language
Hindi
Proficient (C2)
C2
Bengali
Upper Intermediate (B2)
B2

Timeline

Lead Machine Learning Engineer

Matilda Cloud
06.2025 - Current

Staff Data Scientist R&D

Innovaccer HQ
12.2024 - 05.2025

Senior Data Scientist R&D

Innovaccer HQ
01.2023 - 12.2024

Data Scientist R&D

Innovaccer HQ
09.2021 - 12.2022

Data Science Associate Consultant

ZS Associates
07.2019 - 07.2021

Masters - Machine Learning and Artificial Intelligence

BITS Pilani (Work Integrated Learning Program)

Bachelors - Electronics and Communication Engineering

The LNM Institute of Information Technology
SPARSH DUTTA