Summary
Overview
Work History
Education
Skills
Interests
NATIONAL ACHIEVEMENTS
POSITIONS OF RESPONSIBILITY
Timeline
Generic

ALI NASIR

Data Scientist
Bengaluru

Summary

Experienced Data Scientist with expertise in machine learning, statistical modeling, and scalable AI system design. Demonstrated leadership in owning end-to-end projects, mentoring teams, and driving cross-functional collaboration to deliver measurable business impact. Passionate about state-of-the-art AI technologies including GenAI, LLMs, and Agentic systems, with a strong focus on translating cutting-edge research into production-ready solutions.

Overview

6
6
years of professional experience

Work History

Senior Data Scientist

Auxo AI
02.2025 - Current

Project: WinSupply- Agentic QTO Automation platform

Impact: To automate Bill of Quantities (QTO) generation from construction drawings using agentic AI workflows, improving classification accuracy and reducing manual validation through scalable, multi-stage architecture.

  • Took ownership of MVP architecture; implemented YOLO + MaskRCNN baseline achieving 67% accuracy and establishing the first measurable benchmark.
  • Redesigned into a 2-stage pipeline (YOLO → Structured Preprocessing → Siamese Metric Learning), improving accuracy to ~80% and reducing false positives.
  • Architected an agentic AI-driven QTO platform using LangGraph multi-agent orchestration for task decomposition and structured BoQ generation.
  • Integrated LLM extraction, PyMuPDF parsing, and computer vision symbol detection into a unified inference workflow.
  • Implemented Kafka-based real-time feedback enabling interrupt/resume workflows, distributed consumers, confidence scoring, and decoupled ML retraining.

Project: PartsSource- Azure based Intelligent Doc Parsing

Impact: To reduce manual review and improve QC accuracy by building hybrid AI pipelines for MEL ID extraction and enterprise document validation using deterministic and generative methods.

  • Led end-to-end Azure-based document ingestion and MEL ID extraction using Azure Document Intelligence and GPT-based extraction.
  • Built hybrid validation framework combining fuzzy matching, Excel-based deterministic checks, and RAG-driven contextual extraction.
  • Developed automated QC pipelines across 11 KPIs, improving accuracy to ~95% and reducing false positives across 3 REMI/RSA metrics.
  • Mentored 3 team members and drove weekly L1/L2 QC alignment calls with client, ensuring data-driven corrective action tracking and transparent confidence reporting.

DataScientist-2

Physics Wallah
07.2022 - 01.2023

Project: Smart Doubt Recommendations

Impact: To reduce work of SMEs by providing a funnel to answer doubts asked by the students in any video lectures.

  • Developed clusters of doubts based on contextual meaning for 200 chapters using Ada-003 embeddings
  • Integrated a funnel of doubt classification models built to achieve an overall F-1 score of 90% and delivered API.
  • NCERT data is chunked and inserted into Vector Store (AstraDB) using Langchain and Cassandra.
  • Integrated AstraDB by DataStax for our RAG application and built the appropriate prompt which considers subject, exam and chapter metadata to extract the top 5 relevant document from the RAG using “mmr”.
  • Extracted documents are fed to LLM (Azure OpenAI GPT 3.5) as context and the result is obtained.

Project: Personalized Test-series using Adaptive Learning: User-Engagement through ‘Infinite Practice’

Impact: To give personalized test questions to every user in ratios of Easy, medium and hard category depending upon his/her calculated knowledge using statistical modelling and engagement.

  • Conceptualized state-of-art and innovative solutions, item response theory (IRT) to estimate the ability level of a person based on their performance on a test using difficulty, discrimination, speed and guessing parameters.
  • Formulated an algorithm from scratch for the 3Parameter Logistic model (a statistical model) to fit parameters
  • Created 2 APIs using FastAPI to return questions & optimized query to reduce latency from 40 secs to 530 ms.

Project- Topper Identification Using Academic Score (PATENT PROJECT)

Impact: Personalized Engagement Scores nurturing Potential Toppers using their test and level of engagement.

  • Performed Queries to generate 41 KPIs from 60 raw parameters and trained Regression model for around 37K users of JEE With an RMSE: 0.11 and 20k users of NEET exam with an RMSE 0.09.

DataScientist-1

Vedantu Innovation Pvt. Ltd.
07.2021 - 06.2022

Project: VGyan (Doubt Solving through Ai)

Impact: Built an automatic doubt solving chatbot for Ai Live sessions to reduce assistance of teachers.

  • Automated the process of classifying academic doubts & providing relevant answers for each subject and grade.
  • Created sentence embeddings using distilroberta and BERT-transformer, built a classifier with 92% accuracy.
  • Built a regressor with r2 score 0.978 to predict the number of doubts asked in a respective session.

Project: Post Session Comments

Impact: Developed an ETL pipeline to get dispositions &sentiments of students’ comments to automate work of Students’ Account Manager in designating the students’ issues received after a session on platform

  • Performed multi-class classification on the text-comments with an accuracy of 81%.
  • Created and deployed a cronjob-script from scratch in production using Apache-airflow

Summer Internship

Ai India II Computer Vision (R & D)
07.2020 - 09.2020
  • Developed a route optimization solution integrated with land suitability analysis for Pune district using geospatial data modeling.
  • Evaluated multiple ML models for supervised LULC classification (achieving 93% accuracy with XGBoost) and designed a CNN architecture outperforming baseline ImageNet-based classifiers.

Education

M. Tech - Geoinformatics and Natural Resource Image Processing

IIT Bombay
01-2021

B. Plan - Urban and Regional planning

SPA Bhopal
01-2019

Skills

Languages & Core Libraries: Python, NumPy, Pandas, SciPy, PySpark

Machine Learning & Deep Learning: Predictive & Statistical Modeling, Transfer Learning, Few-shot Learning, NLP, Scikit-learn, XGBoost, TensorFlow, Keras, PyTorch, OpenCV, NLTK, spaCy, Word2Vec

AI / LLM & Agentic Systems: GPT-4, Claude, Gemini, LLMs, RAG, Agentic AI, Multi-Agent Orchestration (LangGraph), Hugging Face, LangChain, Azure OpenAI

Frameworks, Cloud & DevOps Tools: FastAPI, Docker, Apache Airflow, AWS, Azure, Git, Postman

Interests

Sketching, Painting, Music, Snooker, Hiking, Chess

NATIONAL ACHIEVEMENTS

  • Presented paper titled ‘Optimal Location of Smart Water-Meters & HOTSpot Analysis of Excessive Water Consumption’ at National seminar on recent advances in geospatial technology at IRS (ISRO). https://iitb.academia.edu/AliNasirBandukwala
  • Acquired 1st place at National NOSPLAN convention 2017 serving as TEAM LEADER of 100+ delegates.

POSITIONS OF RESPONSIBILITY

  • Teaching Assistant - IITB [2019-08-2020-06]
  • PR and Logistics Coordinator- IITB [2019-09-2020-04]
  • President- Rotaract club-SPA Bhopal [2017-07-2018-02]
  • Unit Coordinator- NOSPlan- SPA Bhopal [2017-03-2018-03]
  • Lead-Coordinator- Annual fest SPA Bhopal [2017-09-2018-03]

Timeline

Senior Data Scientist

Auxo AI
02.2025 - Current

DataScientist-2

Physics Wallah
07.2022 - 01.2023

DataScientist-1

Vedantu Innovation Pvt. Ltd.
07.2021 - 06.2022

Summer Internship

Ai India II Computer Vision (R & D)
07.2020 - 09.2020

M. Tech - Geoinformatics and Natural Resource Image Processing

IIT Bombay

B. Plan - Urban and Regional planning

SPA Bhopal
ALI NASIRData Scientist