Summary

Overview

Work History

Education

Skills

Certification

Languages

Timeline

Junaid Ahmed R

Bengaluru

Summary

Experienced Data Scientist with 7.8 years of hands-on expertise in machine learning, NLP, and deep learning, having contributed to AI initiatives at Accenture, TietoEVRY, and currently, Hero MotoCorp. Strong foundation in classical ML techniques and production-grade analytics, with a proven track record of adapting to and leading generative AI transformations. Skilled in fine-tuning LLMs (LLaMA2, Mistral) using QLoRA, building RAG pipelines, and deploying use cases such as contract redlining, clause-level risk analysis, market share analytics, and conversational AI. Experienced with cloud-native AI development on platforms like Databricks, Azure OpenAI, and Hugging Face, seamlessly integrating GenAI into real-world enterprise applications.

Overview

years of professional experience

Certification

Work History

Lead Data Scientist

Hero Motocorp

Bengaluru

11.2024 - Current

Tools & Frameworks: Python, Databricks, PyTorch, QLoRA, Hugging Face Transformers, FAISS, Streamlit, LangChain, Azure, SQL, Power BI.

Leading the development of a contract analysis engine using QLoRA-based fine-tuning on Databricks for legal clause classification, redlining, and risk detection.
Fine-tuned LLaMA2/Mistral models using legal domain data, with LoRA adapters and PEFT (Parameter-Efficient Fine-Tuning).
Built clause-specific training datasets for approximately 12 clause types (Confidentiality, Termination, Liability, etc.). With RAG-based template retrieval.
Integrated FAISS-based similarity search and LangChain pipelines for clause comparison and automated feedback against internal playbooks.
Deployed inference workflows using Databricks Unity Catalog, evaluated using custom clause-level accuracy, and F1 metrics.
Architected and implemented a market share analytics platform for 2-wheeler products: Developed an end-to-end, Python-based system to compute market share changes across categories and time periods (3M, 6M, 12M).
Integrated competitor benchmarking and delta trends, outlier detection by CC range, and visual alerts using Power BI and Streamlit.
Built modular SQL pipelines and batch scripts to automate monthly data ingestion from regional sales.
Mentored two interns and one junior data scientist: Trained interns on foundational NLP techniques, data preprocessing, and Databricks workflows.
Reviewed code, established Git-based workflows, and led pair-programming sessions for fine-tuning experiments.
Conducted weekly syncs and milestone reviews to ensure delivery alignment and learning progression.
Collaborated cross-functionally with legal, sales, and product teams to translate domain problems into scalable ML solutions.
Contributed to internal GenAI strategy by evaluating cost-performance trade-offs between base and fine-tuned LLMs in RAG systems.

Data Science Lead

Tietoevry Create

Bengaluru

11.2021 - 10.2024

Tools & Technologies: Python, Databricks, PySpark, Azure, Whisper, LLaMA2, MLflow, Node.js, Microsoft Bot Framework, LUIS, SQL, XGBoost, Spectral Clustering, Adaptive Cards.

Project: OpenAI Call Transcription and Summarization

Duration: Dec 2023 - Oct 2024.

Developed a GenAI pipeline for transcribing, summarizing, and diarizing call recordings using Whisper and LLaMA2.
Preprocessed over 1,000 hours of audio data, optimized latency via chunking, and parallelism.
Implemented speaker separation using spectral clustering on Whisper embeddings.
Benchmarked inference performance of Whisper vs. commercial ASR on Databricks.

Project: Lead Pricing Model (ILC)

Duration: July 2023 - Oct 2024.

I built a pricing model using demographic, recency, and behavioral data to score ILC leads.
Developed real-time and batch scoring pipelines in Databricks and PySpark.
Validated pricing outputs with business stakeholders to ensure alignment with strategy.

Project: Moneyball Mailing Campaign Optimization.

Duration: July 2023 - October 2024.

Created a scoring engine for lead prioritization in mailing campaigns using logistic regression.
Applied L1 regularization, SMOTE, and engineered over 1,300 features to predict conversion probability.
Achieved $60K in mailing savings and $2M in up-sell revenue through precision targeting.

Project: Auto Triage for Clinical Document Review.

Duration: Oct 2022 - June 2023.

Automated reviewer assignment for high-volume clinical documents using an XGBoost classifier.
Reduced manual triage time by approximately 70%, and integrated the ML pipeline into CI/CD using MLflow and Azure Pipelines.

Project: MLOps Internal Architecture.

Duration: July 2022 - Dec 2022.

Designed an internal MLOps framework for concept drift detection and automated retraining workflows.
Implemented the Confidence Distribution Batch Detection (CDBD) algorithm for label-free drift monitoring.
Deployed alerts and dashboards using MLflow and Azure Automation.

Project: Digital Assistant for Marketing Platform.

Duration: Nov 2021 - June 2022.

Developed the Integrity Digital Assistant (IDA) chatbot for dynamic user interaction with marketing dashboards.
Built multi-intent LUIS models and dialog workflows with Microsoft Bot Framework v4.
Designed Adaptive Cards for rich UI interactions, and deployed via Azure Bot Services.
Enhanced accuracy and intent handling through feedback-driven retraining and Azure Blob interaction logging.

Machine Learning Software Engineer

Accenture

Bengaluru

11.2017 - 10.2021

Project: SAS to PySpark Migration on Databricks.

Duration: Apr 2019 - 2021.

Tools and Technologies Used: Python, PySpark, SAS, Databricks, Azure. Project Description: Led the migration of legacy SAS macro-based ETL pipelines to PySpark on Databricks for improved scalability, cost reduction, and maintainability.

Analyzed complex SAS macros and translated logic into optimized PySpark workflows.
Benchmarked outputs between SAS and PySpark for accuracy assurance.
Increased processing speed and modularity by over 40% through refactored ETL logic.
Developed a reusable testing framework to validate transformation integrity across modules.

Project: Sentiment Analysis, Topic Modeling, and Power BI Automation.

Duration: Nov 2017 - March 2019.

Tools and Technologies used: Python, Flask, Power BI, NLP, DAX, AIML Libraries, Project Description: Developed an end-to-end NLP solution for extracting sentiment and topics from operational data to improve decision-making, paired with automated Power BI dashboards.

Built sentiment analysis models using TF-IDF, cosine similarity, and logistic regression.
Applied topic modeling (LDA/NMF) to identify pain points from text datasets.
Designed data quality similarity checks using embedding-based semantic comparisons.
Automated validation pipelines using Python scripts to feed directly into Power BI.
Completed automation of 15+ business reports using DAX and Power BI connectors.
Continuously monitored model drift and retrained NLP pipelines for improved accuracy.

Education

Bachelor Of Engineering - Electronics and Communication Engineering

JSSATEB

Bengaluru

Skills

Technical skills

💡 Machine Learning & NLP
Supervised & Unsupervised Learning, Classification, Clustering, Feature Engineering, Text Preprocessing, Topic Modeling, Model Evaluation (Precision/Recall/F1)
NLP Techniques: Tokenization, Lemmatization, POS Tagging, TF-IDF, Word2Vec, Transformers, Named Entity Recognition (NER), Clause Classification

🧠 Generative AI & LLM Ecosystem
Fine-tuning with QLoRA, PEFT, LoRA Adapters
LLMs: OpenAI GPT, LLaMA2, Mistral, Gemini, Falcon
Frameworks: LangChain, LlamaIndex, Hugging Face Transformers
Embeddings: Sentence Transformers, OpenAI Embeddings, Azure Text Embeddings
RAG Pipelines: FAISS, Vector DB Integration, Clause-level Retrieval
Azure AI Services: Azure OpenAI, Azure AI Search, Azure AI Studio, AI Foundry
Conversational AI: Microsoft Bot Framework, LUIS, Prompt Engineering, Multi-turn Dialogs

🧰 Deep Learning & CV
Neural Networks: MLP, CNN, RNN, LSTM, Seq2Seq
Applications: Image Classification, Object Detection (YOLOv3), GANs, Autoencoders
Frameworks: TensorFlow, Keras, PyTorch

🧪 Tooling & Libraries
Python Ecosystem: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, SciPy, OpenCV
GenAI Tooling: LangChain, Transformers (HF), Whisper, Streamlit, FAISS
MLOps: MLflow, Azure ML Pipelines, CI/CD for ML, Drift Monitoring

☁️ Cloud & Data Engineering
Cloud: Azure (Databricks, Blob, OpenAI), AWS (S3, EC2), GCP
Big Data: PySpark, Databricks Workflows
Databases: MySQL, SQL Server, Azure Data Lake, Cosmos DB
DataOps: Delta Lake, Data Factory, Azure Synapse

📊 Visualization & BI
BI Tools: Power BI, Tableau, Google Data Studio
Dashboarding: DAX, Drill-through, Automated Reporting

🌐 Web & Version Control
Web Frameworks: Flask, Django, FastAPI
Versioning & DevOps: Git, GitHub, Azure Repos, JIRA, Azure DevOps

Machine learning
Natural language processing
Cloud computing
Predictive analytics
Statistical modeling
Neural networks
Data visualization

Certification

Machine Learning A-Z: Python & R in Data Science – 2018-12
PGDM in Data Science from Imarticus – 2018-03
Deep Learning A-Z – 2019-03
Data Science Specialization- 2020-09
AI 900 – 2021-06
MLOps Specialization – 2022-01
Databricks GEN AI Associate 2025

Languages

English

First Language

Urdu

Proficient (C2)

Kannada

Proficient (C2)

Timeline

Lead Data Scientist

Hero Motocorp

11.2024 - Current

Data Science Lead

Tietoevry Create

11.2021 - 10.2024

Machine Learning Software Engineer

Accenture

11.2017 - 10.2021

Bachelor Of Engineering - Electronics and Communication Engineering

JSSATEB

Junaid Ahmed R

Summary

Overview

Work History

Lead Data Scientist

Data Science Lead

Machine Learning Software Engineer

Education

Bachelor Of Engineering - Electronics and Communication Engineering

Skills

Certification

Languages

Timeline

Lead Data Scientist

Data Science Lead

Machine Learning Software Engineer

Bachelor Of Engineering - Electronics and Communication Engineering

Similar Profiles

MUKESH KUMAR YADAVMUKESH KUMAR YADAV

DEV SARAWATDEV SARAWAT

ROSHANKUMAR PATILROSHANKUMAR PATIL

Rahul BaliRahul Bali

Mittapalli DeepikaMittapalli Deepika