Summary
Overview
Work History
Education
Skills
Timeline
Generic

Sandeep Venugopal

Haslet

Summary

Highly skilled Senior Data Scientist with background in developing innovative data driven solutions. Experience includes transforming complex data into actionable insights, and designing predictive models to support business decisions. Strengths lie in strong analytical skills, knowledge of machine learning algorithms, deep learning , computer vision and LLMs , and proficiency in programming languages such as Python and R. Previous work has resulted in improved efficiency, increased profitability, and better strategic planning through data analysis.

Overview

6
6
years of professional experience

Work History

Senior Data Scientist

Optum
08.2023 - Current
  • Built an ML-based prior authorization system by combining structured claims data (CPT/ICD codes, historical outcomes) with transformer-based embeddings from unstructured clinical notes to improve approval prediction for complex medical cases.
  • Established baseline models using Logistic Regression and Random Forest for transparency and interpretability, then improved performance by fine-tuning ClinicalBERT models on historical authorization data using Amazon SageMaker.
  • Used Claude via Amazon Bedrock to interpret payer policy language alongside clinical context, improving consistency and explainability for edge cases involving multiple comorbidities.
  • Implemented a Retrieval-Augmented Generation (RAG) setup using Amazon OpenSearch to ensure all LLM outputs were grounded in current payer policy documents and aligned with compliance requirements.
  • Deployed real-time inference using SageMaker Endpoints and monitored prediction accuracy, latency, and drift across providers, specialties, and regions.
  • Reduced authorization turnaround time by 38% and improved approval prediction accuracy to 81% by prioritizing high-confidence approvals and denials.
  • Built a GenAI-driven personalization engine to generate member-specific digital outreach based on health risk scores, prior engagement behavior, and language preferences.
  • Used gradient-boosted models to score engagement likelihood and paired them with Claude-based controlled text generation to produce compliant message variations at scale.
  • Designed prompt constraints and validation checks to ensure generated content met healthcare communication and regulatory standards.
  • Deployed batch and near real-time workflows on AWS to support high-volume outreach across mobile and web channels.
  • Increased digital engagement by 31% while reducing call-center volume by shifting routine interactions to personalized self-service flows.
  • Developed NLP pipelines to identify undocumented preventive care gaps by analyzing physician notes and encounter documentation not captured in structured datasets.
  • Used transformer embeddings and clinical entity extraction to map free-text observations to standardized care-gap definitions with confidence scoring.
  • Applied Claude reasoning to handle ambiguous documentation and reduce false positives in care-gap alerts.
  • Automated alert generation using AWS Lambda and integrated outputs into downstream care-management and reporting workflows.
  • Improved preventive care compliance by 22% and significantly reduced manual chart review effort.

Senior Data Scientist

Ford
08.2022 - 07.2023
  • Built supplier disruption risk models by combining internal procurement data with external signals such as weather events, logistics delays, and geopolitical indicators.
  • Improved baseline risk scoring by engineering graph-based features to capture supplier dependencies and cascading risk across multi-tier supply networks.
  • Trained and deployed models using BigQuery ML to support real-time risk monitoring and supply chain planning decisions.
  • Improved disruption forecasting accuracy by 33%, enabling earlier mitigation and reducing downstream production delays.
  • Developed computer vision models to detect manufacturing defects from factory inspection images, reducing reliance on manual classification and inconsistent human judgment.
  • Integrated LLM-based text generation to translate defect signals into clear, actionable explanations for quality engineers.
  • Built feedback loops to capture engineer corrections and retrain models as defect patterns evolved over time.
  • Reduced false defect alerts by 28% and shortened quality review cycles on production lines.

Data Scientist

Verizon
05.2020 - 12.2021
  • Built NLP pipelines to ingest, clean, and classify large volumes of unstructured customer complaints across multiple service channels.
  • Established baseline text classifiers using TF-IDF and Logistic Regression, then improved performance using neural models to better capture contextual patterns in customer narratives.
  • Applied sentiment analysis and urgency scoring to prioritize high-impact complaints and support faster issue resolution.
  • Improved complaint resolution SLA by 29% by enabling more accurate routing and proactive issue detection.
  • Developed time-series forecasting models to predict short-term and long-term network traffic demand for 5G capacity planning.
  • Started with statistical forecasting baselines and transitioned to deep learning models to capture non-linear usage patterns and seasonal spikes.
  • Integrated forecast outputs with Verizon Cloud Platform (VCP) infrastructure planning tools to support capacity optimization decisions.
  • Reduced infrastructure over-provisioning costs by 23% while improving peak-hour network performance.

Education

Master of Science - Information Technology

Clark University
Worcester, Massachusetts, MA
12-2023

Skills

  • Programming & Data Engineering: Python, SQL, PySpark, Scala, R for large-scale data processing, feature engineering, and analytics
  • Machine Learning: Supervised models, gradient boosting, deep learning (LSTM, transformers), model evaluation, calibration, and performance monitoring
  • Natural Language Processing: Clinical and enterprise NLP, transformer embeddings, entity extraction, sentiment and intent classification
  • Generative AI & LLMs: Claude (Amazon Bedrock), Retrieval-Augmented Generation (RAG), prompt design with guardrails, explainability, and human-in-the-loop AI systems
  • Time Series & Forecasting: ARIMA style models, LSTM based forecasting, demand and capacity planning, peak-load analysis
  • Cloud & MLOps: AWS (SageMaker, Lambda, OpenSearch), GCP (BigQuery ML), on-prem platforms (VCP), CI/CD, model deployment and drift monitoring

Timeline

Senior Data Scientist

Optum
08.2023 - Current

Senior Data Scientist

Ford
08.2022 - 07.2023

Data Scientist

Verizon
05.2020 - 12.2021

Master of Science - Information Technology

Clark University
Sandeep Venugopal