AYUSH SINHA

Summary

AI/ML Engineer specialising in production LLM and GenAI systems. At AventIQ, shipped a multimodal RAG support assistant processing structured and unstructured enterprise data - cutting document retrieval effort by 35% and reducing extraction latency from 16s to 5s. Fine-tuned and deployed domain-specific transformer models (BERT, GPT-2, T5, Pythia) using LoRA/QLoRA, and integrated Azure OpenAI to automate ticket resolution at scale. Experienced across the full GenAI stack: RAG pipelines, vector databases, multi-agent orchestration, and AWS inference deployment.

Overview

1

years of professional experience

4

years of post-secondary education

Work History

Independent AI/ML Researcher

Self-Employed

09.2025 - Current

Building a production E-commerce Product Recommendation RAG system using AWS S3, SageMaker, and OpenSearch Serverless as a portfolio project.
Deepening expertise through advanced coursework in quantization and multi-agent systems (deeplearning.ai, CrewAI).
Open-source contributions and applied GenAI experimentation across LLM fine-tuning and retrieval pipeline optimisation.

AI/ML Engineer - LLM & GenAI Systems

AventIQ

New Delhi

10.2024 - 08.2025

Worked on production multimodal RAG support assistant querying PDFs, Excel, CSV, and image documents using PGVector, FAISS, and Hugging Face embeddings - reducing manual document search effort by 35% across enterprise teams.
Integrated Azure OpenAI APIs for automated ticket resolution, cutting average resolution time from ~8 min to ~3 min across the support pipeline.
Fine-tuned and quantized (FP16/8-bit LoRA/QLoRA) 12 domain-specific LLMs including BERT, GPT-2, T5, and Pythia on internal datasets - deploying 4 models to production on Hugging Face Hub with improved inference speed and reduced compute costs.
Built an AI business card scanner with AWS Textract and FastAPI, reducing extraction latency from 16s to 5s (68% improvement) in production.
Designed an AI-driven resume parsing pipeline using AWS SQS for async message processing, improving ingestion throughput by 40% under peak load.
Evaluated models using Accuracy, Precision, Recall, F1, and BLEU metrics across 6 production use cases to validate deployment readiness.

Data Science Intern

Celebal Technologies

05.2023 - 07.2023

Built and evaluated ML/DL classification and object detection models in Python and TensorFlow; implemented full data preprocessing and feature engineering pipelines for 3 internal datasets.

Education

B.Tech - Computer Science & Engineering (AI & ML)

Galgotias University

08.2020 - 05.2024

Skills

Python
PyTorch
Transformers
FastAPI
Pythia
BERT
RoBERTa
T5
GPT-2
Whisper
RAG pipelines
LoRA/QLoRA fine-tuning
Agentic AI
Prompt Engineering
PGVector
FAISS
Pinecone

ChromaDB
AWS (Lambda, Bedrock, S3, SQS, Textract, IAM)
Azure OpenAI
Accuracy
Precision
Recall
F1-Score
BLEU
C
Nodejs
TensorFlow
Scikit-learn
Git
Postman
VS Code

Leadership Experience

Led a 4-person engineering team delivering 3 AI application features to production, coordinating design, development, and cloud deployment across a 6-month cycle.
Mentored 2 junior developers in Python, LLM fine-tuning workflows, and AWS-based AI deployment - both now working independently on production systems.

Websites

Certifications Awards

Employee of the Month, AventIQ, 2025-06-01
CrewAI Multi-Agent Systems, Course Credential
Fundamentals of Quantization, deeplearning.ai

Projects

E-Commerce Product Recommendation RAG System, Python, AWS S3, SageMaker, OpenSearch Serverless, End-to-end production RAG pipeline for personalised product recommendations; AWS-native data ingestion with SageMaker-hosted embeddings and OpenSearch Serverless for vector retrieval. Currently in active development.
AI Business Card Scanner Pro, Python, AWS Textract, FastAPI, Intelligent document extraction backend with optimised inference pipeline; reduced extraction latency from 16s to 5s (68% improvement) in production.
LLM Fine-Tuning & Quantization Pipeline, Python, PyTorch, Transformers, Hugging Face, Applied FP16 and 8-bit LoRA/QLoRA quantization to reduce model size for production deployment. Fine-tuned on domain-specific datasets; shipped 4 models to Hugging Face Hub.
AI-Driven Resume Parser (ATS), Python, AWS SQS, NLP, Async resume parsing and document classification pipeline using AWS SQS for message queuing; improved ingestion throughput by 40% under concurrent load through non-blocking architecture.

Timeline

Independent AI/ML Researcher

Self-Employed

09.2025 - Current

AI/ML Engineer - LLM & GenAI Systems

AventIQ

10.2024 - 08.2025

Data Science Intern

Celebal Technologies

05.2023 - 07.2023

B.Tech - Computer Science & Engineering (AI & ML)

Galgotias University

08.2020 - 05.2024

Summary

Overview

Work History

Independent AI/ML Researcher

AI/ML Engineer - LLM & GenAI Systems

Data Science Intern

Education

B.Tech - Computer Science & Engineering (AI & ML)

Skills

Leadership Experience

Websites

Certifications Awards

Projects

Timeline

Independent AI/ML Researcher

AI/ML Engineer - LLM & GenAI Systems

Data Science Intern

B.Tech - Computer Science & Engineering (AI & ML)

Similar Profiles

Sinjini MitraSinjini Mitra

Sandipan KarSandipan Kar

Ferdanur ArmutluFerdanur Armutlu

BIBIAN AMITOBIBIAN AMITO