Summary
Overview
Work History
Education
Skills
Timeline
Generic

Pooja Shinde

Mumbai

Summary

Data Scientist with 4.5 years of experience in building AI and GenAI solutions across Supply Chain and Banking Finance domains. Specialized in LLM fine-tuning, Agentic AI systems, and Knowledge-Augmented Generation (KAG), with hands-on experience in Knowledge Graphs (Neo4j), multimodal pipelines, and LLM-powered applications.

Overview

5
5
years of professional experience

Work History

Data Scientist

Shapoorji Pallonji Finance Private Limited
Mumbai
06.2025 - Current

Client – Afcons Infrastructure
Project – AI Knowledge Management System (Agentic AI + KAG + Multimodal)

  • Built an end-to-end multimodal Knowledge Management System to process construction data (documents, images, audio, video) using LLaVA (image/video captioning), Whisper (speech-to-text), and PaddleOCR (text extraction) pipelines.
  • Developed an Agentic AI system using Google ADK, enabling natural language querying over Knowledge Graphs via LLM-generated Cypher queries and Neo4j-based graph retrieval.
  • Implemented Knowledge-Augmented Generation (KAG) using Microsoft GraphRAG + Neo4j, with Azure OpenAI (GPT-4o) for grounded answer generation and citation-based responses.
  • Fine-tuned a LLaMA 3.x (~3B) model using multi-task instruction tuning (Q&A, summarization, prompt refinement) on domain-specific Supply Chain Finance data to improve reasoning and domain accuracy.

Data Scientist

Capgemini
Navi Mumbai
11.2020 - 05.2024
  • Worked on early-stage GenAI solutions (2023–2024), including:
    Developed a Text-to-SQL system using LLMs to convert natural language queries into SQL for business users.
    Implemented a Retrieval-Augmented Generation (RAG) pipeline to extract insights from structured and unstructured financial data.
  • Developed and deployed machine learning-based credit risk models using Logistic Regression, XGBoost, and Random Forest to improve lending risk assessment.
  • Built automated and scalable ML pipelines for data preprocessing, feature engineering, and model deployment

Education

Bachelor of Engineering - Electronics Technology

Ramrao Adik Institute Of Technology
Navi Mumbai
10-2020

Skills

Agentic AI & Frameworks: Google ADK, LangChain, LLM Agents, Multi-Agent Systems
Large Language Models (LLMs): OpenAI (GPT-4o), LLaMA, DeepSeek
Fine-Tuning & LLM Adaptation: Instruction tuning (multi-task), domain-specific LLM customization
Knowledge-Augmented Generation (KAG): GraphRAG, context-aware retrieval, knowledge integration
Knowledge Graphs: Neo4j, Cypher Querying, Graph-based reasoning
Multimodal AI: LLaVA (vision-language), Whisper (speech-to-text), PaddleOCR
Vector Databases & Retrieval: LanceDB, ChromaDB, Pinecone, Embeddings
Programming: Python
Data Handling: Pandas, NumPy
Prompt Engineering & NLP: Prompt design, embeddings, text processing
Databases: PostgreSQL
Cloud & Deployment: AWS, Docker

Timeline

Data Scientist

Shapoorji Pallonji Finance Private Limited
06.2025 - Current

Data Scientist

Capgemini
11.2020 - 05.2024

Bachelor of Engineering - Electronics Technology

Ramrao Adik Institute Of Technology
Pooja Shinde