Summary
Overview
Work History
Education
Skills
Awards
Certification
Timeline
background-images

Vivek Kawathalkar

Pune

Summary

Data Scientist with 3.8+ years of experience designing and delivering Generative AI solutions. Specialized in building end-to-end scalable LLM systems, enterprise RAG pipelines, and FastAPI endpoints using Claude 3.7 Sonnet, Azure OpenAI and cloud services. Experienced in vector embeddings, prompt engineering, LLM workflow orchestration, Retrieval-Augmented Generation (RAG), FastAPI. Proven ability to deliver high-performance, reliable Gen AI systems in fast-paced environments, with a strong focus on scalability, observability, and business impact.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Associate Data Scientist

Cognizant
05.2022 - Current


  • Project1: Dosing and Competitive Pricing Intelligent Toolkit | Client: AbbVie | Life Science
  • Built a production-grade Gen AI-powered Data Modernization Engine to parse unstructured formulary data into structured formats, analytics-ready datasets, closing 75% of data gaps in commercial drug databases.
  • Engineered and deployed a Python-based automated web-scraping and RAG pipeline to extract dosing, efficacy, clinical study metrics, and drug attributes, reducing manual curation effort by 90% and enabling real-time analytics for 3000+ brands.
  • Developed scalable FastAPI endpoints with asynchronous processing and multithreading to support concurrent users and long-running jobs.
  • Orchestrated LLM-driven workflows using Claude 3.7 Sonnet, Python, LangGraph, and evaluation tools. Designed and implemented Trino DB schemas and deployed services on AWS EC2.
  • Reduced end-to-end data extraction and normalization turnaround time from 2–4 hours to under 3 minutes.
  • Achieved >99% pipeline uptime through robust error handling, retry mechanisms, audit logging, and monitoring.
  • Tech Stack: RAG, Chroma DB, OpenAI embeddings, Python, Async Processing, Multithreading, FastAPI, Pydantic, LangGraph, AWS EC2, Logging, Trino DB, Web Scraping, Prompt Engineering, Power BI


  • Project2: Gen AI Powered Data Stewardship Platform | Client: AbbVie | Life Science
  • Developed a Gen AI-driven platform to automate data stewardship for pharmaceutical companies struggling with fragmented HCP, HCO, patient, and product data across CRM, ERP, and clinical trial systems.
  • Optimized data quality scores to 98%+, reducing duplicate record review time by 70 to 90%
  • Reduced manual data stewardship effort by up to 70%, cutting operational costs significantly. Improved compliance readiness and reporting speed, achieving 180% ROI within the first year.
  • Developed real-time validation and compliance enforcement through rule-based and AI-driven checks, ensuring GDPR/HIPAA adherence.
  • Integrated automated web data enrichment using browser-use agents to extract external reference data and correct incomplete or inconsistent records.
  • Developed scalable FastAPI endpoints to support concurrent users. for deployment used AWS EC2 server.
  • Tech Stack: browser-use, Python, Azure OpenAI GPT-4o LLM, FastAPI, Pydantic, Phoenix Monitoring, audit table, Prompt Engineering, Prompt design, ragas, Few-Shot Prompting.


  • Project3: Document Gen AI Bot | client: MGM resorts | Travel and Hospitality
  • Document Gen AI bot is designed to enhance customer interactions by delivering instant responses and reducing support workload. The solution handled unstructured PDF data and complex policy, customer support files.
  • Streamlined manual search by processing unstructured PDF and text data using Azure OpenAI and Langchain with Retrieval-Augmented Generation (RAG).
  • Implemented a multi-layer LLM architecture including Responsible AI layer (filtering harmful/irrelevant queries using Gen AI), Classification layer (query-type classification using LLM), and Automated Response layer (RAG-based vector retrieval and answer generation). Improved response reliability and safety by implementing LLM-based classification and Responsible AI filters before answer generation.
  • Empowered business and support teams to retrieve relevant data in real-time, Reduced manual document search and support resolution time by 90%, and improved customer support response speed by 95% through document retrieval.
  • Developed scalable FastAPI endpoints to support concurrent users. containerized using Docker and deployed on Azure App Service. orchestrated LLM workflows, leveraged GPT-4 16k LLM, text-embedding-ada-002, Azure blob, python, langchain.
  • Tech Stack: RAG (Retrieval-Augmented Generation), Chroma DB, text-embedding-ada-002 (1,536 D), Python, Azure GPT-4 16k LLM, FastAPI, Pydantic, Phoenix Monitoring, audit table, Prompt Engineering, ragas, Few-Shot Prompting, Azure App Service, Docker, Cognitive Search, Azure blob etc.

Education

B. Tech - ExTC

SGGSIE&T
Maharashtra, India
05-2022

HSC - Science

K.B.P Mahavidyalaya
Maharashtra, India
06-2018

SSC -

Kavthekar Prashala
Maharashtra, India
06-2016

Skills

    Programming Languages: Python, SQL
    Frameworks & Libraries: LangChain, LangGraph, FastAPI, TensorFlow
    Gen AI & LLM Technologies: Claude 37 Sonnet, Azure OpenAI (GPT-4), OpenAI Embeddings, LLaMA 2, RAG, Prompt Engineering
    Backend & Systems Engineering: Asynchronous Processing, Multithreading, Logging, Retry Mechanisms, OOP
    Databases & Storage: Trino DB, Azure Blob Storage, AWS S3, Chroma DB
    Tools & Cloud Platforms: Git, Azure DevOps, Jira, Azure, AWS, Docker, Power BI

Awards

Received Cognizant AIA Annual award 2023 “Shining Star | Learning – Ekalavya” Award. link

Received Cognizant Cheers Award for the work. link

Cognizant NA-RCGTH Generative AI Hackathon Certificate: link

Certification


AZ-900 Microsoft Azure Fundamentals Certificate

DP-900 Microsoft Azure Data Fundamentals Certificate

Timeline

Associate Data Scientist

Cognizant
05.2022 - Current

B. Tech - ExTC

SGGSIE&T

HSC - Science

K.B.P Mahavidyalaya

SSC -

Kavthekar Prashala
Vivek Kawathalkar