Data Scientist with 4 years of hands-on industry experience delivering scalable machine learning, deep learning, and Generative AI solutions that drive real business value.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Data Scientist
OrBEX Technologies PVT LTD
07.2021 - Current
Designed and developed an enterprise-scale intelligent knowledge assistant using a hybrid RAG architecture to deliver accurate, context-aware question answering over large organizational document repositories.
Implemented hybrid search techniques to balance semantic relevance with exact-match accuracy for domain-specific terms, identifiers, and policy references.
Built metadata-aware retrieval mechanisms with role-based and contextual filters (department, document type, time range) to ensure secure and relevant responses.
Integrated LLMs using structured prompts, context windows, and citation logic to generate grounded, explainable answers with source attribution.
Developed scalable REST APIs using FastAPI with asynchronous processing to support low-latency query handling.
Delivered a reliable and scalable enterprise knowledge platform that significantly improved information accessibility and reduced reliance on manual document searches.
Improved answer relevance and factual accuracy through the combination of semantic and keyword-based retrieval techniques.
Developed an enterprise-scale analytics solution to predict disease progression risk for chronic patients using machine learning, combined with a retrieval-augmented generation (rag) system to provide clinical explanations and guideline-based recommendations.
Collected and analyzed patient-level clinical data including demographics, diagnoses, lab results, vitals, and historical medical records.
Conducted exploratory data analysis (EDA) to identify trends, disease patterns, and key predictors of disease progression.
Built and evaluated machine learning models (Random Forest, XGBoost) to predict disease progression risk scores.
Optimized model performance using hyperparameter tuning and evaluated results using metrics such as ROC-AUC, precision, recall, and F1-score.
Implemented a RAG-based clinical knowledge support system to retrieve relevant treatment guidelines, clinical protocols, and research summaries from internal medical documents.
Integrated ML predictions with RAG outputs to provide contextual explanations such as contributing risk factors and recommended monitoring actions.
Developed APIs using FastAPI to expose prediction and clinical explanation services.
Collaborated with healthcare stakeholders to translate clinical requirements into data-driven solutions.
Created dashboards and reports to visualize risk distribution, model performance, and patient risk trends.
Enabled early identification of high-risk patients using machine learning–based disease progression risk scoring.
Improved clinical decision-making by providing guideline-backed explanations through a RAG-based clinical knowledge system.
Developed an end-to-end Machine Learning–based Credit Risk Scoring system to assess the probability of loan default and support data-driven lending decisions.
Worked closely with business stakeholders to understand lending workflows, credit approval criteria, and regulatory constraints.
Defined the target variable as loan default (binary classification) and framed the problem as a probability-based risk scoring task rather than a hard accept/reject decision.
Identified key business objectives such as minimizing defaults, controlling false approvals, and maintaining regulatory compliance.
Collected and analyzed historical loan application data including customer demographics, income, employment details, credit history, and repayment behavior.
Performed extensive Exploratory Data Analysis (EDA) to identify missing values, outliers, skewed distributions, and multicollinearity among features.
Analyzed default rate trends across customer segments to identify high-risk patterns.
Built baseline models using Logistic Regression to ensure interpretability and establish benchmark performance.
Converted predicted probabilities into interpretable credit risk scores and risk bands (Low / Medium / High).
Improved loan default detection by accurately identifying high-risk applicants before approval.
Reduced financial risk by minimizing risky loans.
Education
Bachelor of Technology(B.Tech) -
Siddhi Vinayak College Of Engineering
Alwar, Rajasthan, India
Skills
Supervised and Unsupervised Machine Learning techniques
Senior Executive – Data & Process optimization at AAPC India Pvt. Ltd. NoidaSenior Executive – Data & Process optimization at AAPC India Pvt. Ltd. Noida