Summary
Overview
Work History
Education
SKILLS
LEADERSHIP
EXTRA-CURRICULAR
Timeline
AdministrativeAssistant
SIDDHARTH AGARWAL

SIDDHARTH AGARWAL

Summary

With 6+ years of experience in building AI-powered solutions, specializing in developing LLM-based and NLP-driven products. Expertise includes designing agents, RAG systems, retrieval engines with vector databases, and optimizing for LLM costs. Extensive experience in fine-tuning LLMs, implementing context-aware learning techniques, and constructing robust ML-Ops infrastructures for seamless model deployment and integration.

Overview

6
6
years of professional experience

Work History

Senior Machine Learning Engineer

DOCKET AI
12.2024 - Current
  • Designed and implemented an LLM-based RAG-powered retrieval engine to answer complex sales-related queries requiring multi-source (web, seismic, jira, slack, salesforce, notion, etc) data aggregation.
  • Leveraged LightRAG and HippoRAG for indexing and retrieval, ensuring precise context generation. Engineered a high-precision Knowledge Graph (KG) by integrating chunked text, embeddings and their entities, relations from 12+ data sources, achieving 90%+ query resolution accuracy.
  • Developed an LLM-as-Judge Eval framework to assess query-context-answer alignment, guiding fine-tuning and graph pruning. Enhanced retrieval quality, enabling 40% more query resolutions beyond traditional embedding-based methods.

Senior NLP Engineer

ARINTRA
03.2023 - 12.2024
  • Leading a team of 5 ML Engineers to build the DL solution powering Arintra's flagship product: Autonomous Medical Coding Platform. This project involved building and deploying advanced NLP models, leveraging state-of-the-art language models such as Gemini, Palm, and GPT-4 to enhance the platform's capabilities.
  • Addressed complex NLP challenges in NER (named entity recognition) and RE (relation extraction) by comprehending clinical context from health charts (EHR) supported by generative AI, significantly boosting the accuracy (P&R) for produced clinical codes which are critical in determining the direct-to-business (D2B) score. Even minor improvements in accuracy had a substantial impact on the D2B score, raising it from 52% to 71%. The D2B score measures the #cases where claims from all clinical codes are completely marked without requiring human intervention. Consequently, the turnaround time (TAT) improved from 3weeks to 1day as the need for human-in-the-loop (HITL) and clinical coders to rectify errors reduced.
  • Spearheaded ML-Ops initiatives, constructing robust data pipelines and building suite of micro-services for seamless interaction in a production environment. Addressed high costs associated with managed LLMs, by implemented redis caching, token optimizations, prompt compression (LLMLingua) along with other improvements to reduce LLM costs by 30%.
  • Designed the deployment framework for DL services incorporating features like centralized logging, jenkins-based CI/CD, and a Cloud Run solution in GCP.
  • Managed model serving & inferring, employing advanced ways such as weight quantization and technologies like Q-Lora for feasibly fine-tuning in-house models. Also introduced & built qdrant vector-db (sap-BERT generated embeddings of healthcare concepts - UMLS) to enable RAG architecture improving the LLM integration.
  • Responsible for all DL product integration involving backend responsibilities, demonstrating proficiency in java, python, springboot, fastapi, jenkins, etc. Leveraged MLOps tools such as VertexAI with GPU systems, along with employing multithreading and concurrency management to optimize production efficiency.
  • Implemented & migrated centralized logging using Open telemetry, Open dashboard, resulting in considerable savings on cloud logging costs.
  • Led initiatives to automate the workflow of a clinical coding team, which involved streamlining the annotation of Ground Truth data and enhancing the process of model evaluation.

Lead, AJIO Search Implementation

COUTURE AI
08.2021 - 03.2023
  • Led the Search team of 8, consisting of data scientists, data engineers and business analysts, in building & upgrading the Search Algorithm for the e-commerce marketplace AJIO.
  • The improvement impact were significant as the solution initiated with 10% for the AB test group and has now been rolled out to 100% of AJIO users.
  • Led POC for all AJIO stakeholders to build NLP based search query tool that supports users' search intent, improving search funnel conversion (SLC->ATC) by 21%. Worked with the design and product to develop this with Search Autocomplete for users' query suggestions while Searching in the catalog using BERT models that use cache of partial queries to support scale. The re-engineered design improved the Autocomplete conversions by 3X.
  • Introduced and unified strategies for implementing Autosuggest as a central service to support AJIO's search suggestions for user queries to enhance the search experience. Regularly referred to product KPIs, to implement a fallback logic. The resulting proportion of Autocomplete searches has boosted from 9% to 20%.
  • Drove the search team to collaborate and develop search architecture using Deep Learning methods to facilitate user intent-driven search. Managed data flow for user interaction feedback by regularly synchronizing with stakeholders from AJIO and Couture, significantly contributing to AJIO's revenue jump of 7% in 14 months. Designed data flow and deployed modules within Search, including spell correction, entity identification, contextual inference using NLP, Deep-learning & statistical methods. The resulting comparisons against AJIO's existing solution led them to extend the contract for Personalized Search.

Data Scientist

INNOVACCER INC
06.2019 - 08.2021
  • Coordinated with a team of 8 SMEs (medical practitioners) in formulating standard clinical factors to suspect the top ~100 chronic conditions. Designed data flow architecture for a feedback cycle from the downstream patient-facing applications to improve our suspect accuracy. The standardization reduced the manual efforts by appointed Clinical coders significantly.
  • Introduced and strategized analytical product to suspect ~3MM patients of chronic health conditions unreported in their documented history using clinical factors of lab tests and medications data in patients' activity. Coordinated with product managers & designers to tailor BI Dashboards and data flow. The solution was finally deployed in production in 4 applications (including patient-facing and physician-facing apps) with a precision of ~71% that overwhelmed the industrial standards (~47%), leading to increased customer demands and requirements from over 5 hospital organizations (ACO) across US.
  • Awarded Champion of the Quarter for the same for being a prime contributor in improving patients' risk accuracy, modularizing and refactoring risk management module that improved the Risk performance by bringing down gaps between insurance and claims by 20%.

Education

B. Tech -

Indian Institute of Technology Guwahati
Chemical Engineering
06-2019

SKILLS

Python, Java, NLP, ML-Ops, SQL, RAG, Vector DB, Predictive Modelling, Fine-tuning

LEADERSHIP

ENTREPRENEURSHIP CELL, IIT GUWAHATI · Convenor - Overall Head, E-Summit Apr. 2017 to Mar. 2018

  • Headed a proficient 3 tier team of over 70+ students of Heads, Managers and Senior Executives to conduct the largest E-Summit of North-East India.
  • Organized multiple series of talks, workshops, business competitions and gave lessons to promote entrepreneurship among student community.
  • Initiated and popularized second largest business case challenge of India - Strategy Storm with an initiatory participation of 600+ teams from elite universities like IIM A, IIT B, etc.
  • Received sponsored invitation to one of Europe's largest E-Summit IdeaLabs 2018 organized in Vallendar, Germany and engaged in networking events.

EXTRA-CURRICULAR

  • Ranked 25th worldwide: Creative Shock 2016, Social Business Case Study Competition, ISM Lithuania - Europe
  • Winner, Parliamentary Debate: Inter-hostel Debating Challenge, Inter-Hostel Cultural Competition 2017, IIT Guwahati
  • National Service Schema: Tutored young backward students while visiting local public schools as a social volunteer.
  • Table Tennis: Participated in Inter-School Championship and won Silver in the doubles category.

Timeline

Senior Machine Learning Engineer

DOCKET AI
12.2024 - Current

Senior NLP Engineer

ARINTRA
03.2023 - 12.2024

Lead, AJIO Search Implementation

COUTURE AI
08.2021 - 03.2023

Data Scientist

INNOVACCER INC
06.2019 - 08.2021

B. Tech -

Indian Institute of Technology Guwahati
SIDDHARTH AGARWAL