Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Accomplishments
Project Details
Timeline
Generic
Amar Mandal

Amar Mandal

Navi Mumbai

Summary

Senior Data Scientist with over five years of experience in designing and scaling AI/ML solutions in LLMs, NLP, and computer vision. Awarded Innovation of the Year three times for impactful technology solutions. Proven leader in guiding teams to deliver high-quality, business-impacting results.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Sr. Data Scientist

Intelligence Node
Mumbai
06.2021 - 04.2026
  • Architected a hybrid extraction platform combining regex rules, API-sourced logic, and LLM-generated fallback patterns to automate e-commerce product attribute extraction at scale.
  • Built multi-source data pipelines across CSV, Google Sheets, MongoDB, and APIs, with automated crawler-based backfilling to improve dataset completeness for training and validation.
  • Delivered production extraction workflows with 99.54% average validation accuracy and 99.72% average match rate on selected candidates.
  • Productionized AI systems using FastAPI, async background workers, run tracking, structured logging, and caching to improve scalability, observability, and cost efficiency.
  • Established LLM governance frameworks for baseline, multi-stage, and comparison pipelines, enabling effective cost tracking, robust retry strategies, and improved failure controls.
  • Designed and scaled AI-based retail catalog quality checks, supporting 500K–1M validations per month across ~250K items, which accelerated issue detection and remediation.
  • Improved product matching performance by 20% using transformer-based NLP models and by 8% using computer vision-based image color matching techniques.
  • Built automated test-and-self-correction workflows for extraction rule generation, streamlining manual setup and enhancing production data reliability.

Business Analyst

Reliance JIO
Navi Mumbai
08.2020 - 05.2021
  • Collaborated with JioCS (Video Bot Assistant) and HelloJio (Digital Voice Assistant) teams to enhance user experience.
  • Developed and enhanced data for machine learning algorithms supporting the voice assistant and video bots.
  • Analyzed data from voice assistants using visualization tools to derive insights and monitor key performance indicators.
  • Automated workflow processes using Python to improve operational efficiency.

Personal Project

05.2020 - 06.2020
  • Fight Fare Prediction - On the basis of 2019 data, the application is capable of predicting 2020 Flight fare. Extraction of data from existing feature to improve and reduce the number of features. Flask API is used for backend to create logics. Deployment on Heroku cloud. This project was featured on youtube with more than 200,000 views.
  • Challenge: First time Deployement on Cloud.

Education

BE - Mechanical Engineering

SIES Graduate School of Technology
11-2020

Skills

  • LLMs
  • AI orchestration
  • Code development
  • System productionization
  • Machine learning
  • Model deployment
  • Data pipeline development
  • Natural language processing
  • Statistical analysis
  • Team collaboration

Certification

  • Machine Learning Masters, iNeuron, 04/01/20, 08/01/20
  • Deep Learning Masters, iNeuron, 08/01/20, 03/01/21

Languages

  • English, Full Professional Proficiency
  • Hindi, Full Professional Proficiency

Accomplishments

Innovation of the Year: 3 times (2023, 2024, 2025).
Employee of the Year: 2024
Personal Project Demonstration on Youtube: 200,000+ views

Project Details

Crawler Chef to replace Jr. Developer

  • Architected a hybrid extraction platform combining historical regex recipes, API-driven rules, and LLM-generated fallback patterns to automate e-commerce product attribute extraction at scale.
  • Built a multi-source data ingestion pipeline across CSV, Google Sheets, MongoDB, and API payloads, with automated crawler-based backfilling for missing labels to improve training and validation coverage.
  • Delivered production-ready extraction recipes achieving 99.54% average validation accuracy and 99.72% average match rate on selected candidates.
  • Productionized the solution using FastAPI, asynchronous background workers, run-level tracking, and structured event logging to improve auditability, observability, and operational monitoring.
  • Established LLM pipeline governance by benchmarking baseline, multi-stage, and comparison workflows, while implementing request-level cost tracking, retry logic, and failure controls.

Parsing via Codex Skills in the Loop

  • Designed an AI-assisted rule generation workflow that created extraction logic directly from sample web pages, significantly reducing manual rule authoring effort.
  • Built automated validation tests to compare extracted outputs against expected results, improving pre-deployment quality assurance.
  • Implemented self-correcting feedback loops where failed test cases triggered automatic rule refinement and re-validation.
  • Improved data reliability and consistency for downstream analytics and machine learning pipelines through automated extraction quality controls.
  • Reduced debugging effort and increased confidence in production data pipelines by introducing repeatable, test-driven rule generation workflows.

Image Checks for E-commerce website (Client: Kroger)

  • Defined and solved a retail catalog consistency problem by building AI-based validation checks to detect image-content mismatches and product quality gaps.
  • Scaled the validation pipeline to 500K–1M checks per month across ~250K items, aligned with a biweekly client refresh process.
  • Designed decisioning logic to flag non-compliant catalog items and generate AI-driven alternative recommendations to accelerate remediation.
  • Productionized the solution as a FastAPI service with asynchronous inference orchestration and MongoDB SHA-based caching, improving latency and reducing inference cost through deduplication and result reuse.
  • Enabled higher catalog quality and faster issue resolution for a large retail client through automated validation and recommendation workflows.

Product Matching using Textual Information

  • Improved the product matching engine by 20% using textual features such as product name, description, attributes, and price for fashion catalog matching.
  • Fine-tuned transformer-based NLP models including ALBERT, DistilBERT, RoBERTa, and XLM on custom retail datasets to optimize semantic product matching performance.
  • Took the solution from ideation to deployment, owning experimentation, model selection, API integration, and production rollout.
  • Built and deployed the matching service using PyTorch, Sentence-BERT, Hugging Face, FastAPI, Docker, MongoDB, and AWS EC2.
  • Strengthened multilingual and semantic matching capabilities for fashion products through custom model training and evaluation.

Visual Crawling

  • Built a hybrid visual crawling system to extract information from product PDFs, images, and videos using multi-stage object detection and OCR pipelines.
  • Reduced manual QA scraping effort by 70% by automating visual data extraction workflows.
  • Used YOLOv5 for object detection and EasyOCR for textual extraction on visual assets.
  • Deployed the solution on AWS EC2 to support scalable crawling and automated content extraction.

Timeline

Sr. Data Scientist

Intelligence Node
06.2021 - 04.2026

Business Analyst

Reliance JIO
08.2020 - 05.2021

Personal Project

05.2020 - 06.2020

BE - Mechanical Engineering

SIES Graduate School of Technology
Amar Mandal