Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Junaid Ahmed R

Bengaluru

Summary

Experienced Data Scientist with 7.8 years of hands-on expertise in machine learning, NLP, and deep learning, having contributed to AI initiatives at Accenture, TietoEVRY, and currently, Hero MotoCorp. Strong foundation in classical ML techniques and production-grade analytics, with a proven track record of adapting to and leading generative AI transformations. Skilled in fine-tuning LLMs (LLaMA2, Mistral) using QLoRA, building RAG pipelines, and deploying use cases such as contract redlining, clause-level risk analysis, market share analytics, and conversational AI. Experienced with cloud-native AI development on platforms like Databricks, Azure OpenAI, and Hugging Face, seamlessly integrating GenAI into real-world enterprise applications.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Lead Data Scientist

Hero Motocorp
Bengaluru
11.2024 - Current

Tools & Frameworks: Python, Databricks, PyTorch, QLoRA, Hugging Face Transformers, FAISS, Streamlit, LangChain, Azure, SQL, Power BI.

  • Leading the development of a contract analysis engine using QLoRA-based fine-tuning on Databricks for legal clause classification, redlining, and risk detection.
    Fine-tuned LLaMA2/Mistral models using legal domain data, with LoRA adapters and PEFT (Parameter-Efficient Fine-Tuning).
    Built clause-specific training datasets for approximately 12 clause types (Confidentiality, Termination, Liability, etc.). With RAG-based template retrieval.
    Integrated FAISS-based similarity search and LangChain pipelines for clause comparison and automated feedback against internal playbooks.
    Deployed inference workflows using Databricks Unity Catalog, evaluated using custom clause-level accuracy, and F1 metrics.
  • Architected and implemented a market share analytics platform for 2-wheeler products: Developed an end-to-end, Python-based system to compute market share changes across categories and time periods (3M, 6M, 12M).
    Integrated competitor benchmarking and delta trends, outlier detection by CC range, and visual alerts using Power BI and Streamlit.
    Built modular SQL pipelines and batch scripts to automate monthly data ingestion from regional sales.
  • Mentored two interns and one junior data scientist: Trained interns on foundational NLP techniques, data preprocessing, and Databricks workflows.
    Reviewed code, established Git-based workflows, and led pair-programming sessions for fine-tuning experiments.
    Conducted weekly syncs and milestone reviews to ensure delivery alignment and learning progression.
  • Collaborated cross-functionally with legal, sales, and product teams to translate domain problems into scalable ML solutions.
  • Contributed to internal GenAI strategy by evaluating cost-performance trade-offs between base and fine-tuned LLMs in RAG systems.

Data Science Lead

Tietoevry Create
Bengaluru
11.2021 - 10.2024

Tools & Technologies: Python, Databricks, PySpark, Azure, Whisper, LLaMA2, MLflow, Node.js, Microsoft Bot Framework, LUIS, SQL, XGBoost, Spectral Clustering, Adaptive Cards.

Project: OpenAI Call Transcription and Summarization

Duration: Dec 2023 - Oct 2024.

  • Developed a GenAI pipeline for transcribing, summarizing, and diarizing call recordings using Whisper and LLaMA2.
  • Preprocessed over 1,000 hours of audio data, optimized latency via chunking, and parallelism.
  • Implemented speaker separation using spectral clustering on Whisper embeddings.
  • Benchmarked inference performance of Whisper vs. commercial ASR on Databricks.
Project: Lead Pricing Model (ILC)

Duration: July 2023 - Oct 2024.

  • I built a pricing model using demographic, recency, and behavioral data to score ILC leads.
  • Developed real-time and batch scoring pipelines in Databricks and PySpark.
  • Validated pricing outputs with business stakeholders to ensure alignment with strategy.
Project: Moneyball Mailing Campaign Optimization.

Duration: July 2023 - October 2024.

  • Created a scoring engine for lead prioritization in mailing campaigns using logistic regression.
  • Applied L1 regularization, SMOTE, and engineered over 1,300 features to predict conversion probability.
  • Achieved $60K in mailing savings and $2M in up-sell revenue through precision targeting.
Project: Auto Triage for Clinical Document Review.

Duration: Oct 2022 - June 2023.

  • Automated reviewer assignment for high-volume clinical documents using an XGBoost classifier.
  • Reduced manual triage time by approximately 70%, and integrated the ML pipeline into CI/CD using MLflow and Azure Pipelines.
Project: MLOps Internal Architecture.

Duration: July 2022 - Dec 2022.

  • Designed an internal MLOps framework for concept drift detection and automated retraining workflows.
  • Implemented the Confidence Distribution Batch Detection (CDBD) algorithm for label-free drift monitoring.
  • Deployed alerts and dashboards using MLflow and Azure Automation.
Project: Digital Assistant for Marketing Platform.

Duration: Nov 2021 - June 2022.

  • Developed the Integrity Digital Assistant (IDA) chatbot for dynamic user interaction with marketing dashboards.
  • Built multi-intent LUIS models and dialog workflows with Microsoft Bot Framework v4.
  • Designed Adaptive Cards for rich UI interactions, and deployed via Azure Bot Services.
  • Enhanced accuracy and intent handling through feedback-driven retraining and Azure Blob interaction logging.

Machine Learning Software Engineer

Accenture
Bengaluru
11.2017 - 10.2021

Project: SAS to PySpark Migration on Databricks.

Duration: Apr 2019 - 2021.

Tools and Technologies Used: Python, PySpark, SAS, Databricks, Azure. Project Description: Led the migration of legacy SAS macro-based ETL pipelines to PySpark on Databricks for improved scalability, cost reduction, and maintainability.

  • Analyzed complex SAS macros and translated logic into optimized PySpark workflows.
  • Benchmarked outputs between SAS and PySpark for accuracy assurance.
  • Increased processing speed and modularity by over 40% through refactored ETL logic.
  • Developed a reusable testing framework to validate transformation integrity across modules.

Project: Sentiment Analysis, Topic Modeling, and Power BI Automation.

Duration: Nov 2017 - March 2019.

Tools and Technologies used: Python, Flask, Power BI, NLP, DAX, AIML Libraries, Project Description: Developed an end-to-end NLP solution for extracting sentiment and topics from operational data to improve decision-making, paired with automated Power BI dashboards.

  • Built sentiment analysis models using TF-IDF, cosine similarity, and logistic regression.
  • Applied topic modeling (LDA/NMF) to identify pain points from text datasets.
  • Designed data quality similarity checks using embedding-based semantic comparisons.
  • Automated validation pipelines using Python scripts to feed directly into Power BI.
  • Completed automation of 15+ business reports using DAX and Power BI connectors.
  • Continuously monitored model drift and retrained NLP pipelines for improved accuracy.

Education

Bachelor Of Engineering - Electronics and Communication Engineering

JSSATEB
Bengaluru

Skills

    Technical skills

    πŸ’‘ Machine Learning & NLP
    Supervised & Unsupervised Learning, Classification, Clustering, Feature Engineering, Text Preprocessing, Topic Modeling, Model Evaluation (Precision/Recall/F1)
    NLP Techniques: Tokenization, Lemmatization, POS Tagging, TF-IDF, Word2Vec, Transformers, Named Entity Recognition (NER), Clause Classification

    🧠 Generative AI & LLM Ecosystem
    Fine-tuning with QLoRA, PEFT, LoRA Adapters
    LLMs: OpenAI GPT, LLaMA2, Mistral, Gemini, Falcon
    Frameworks: LangChain, LlamaIndex, Hugging Face Transformers
    Embeddings: Sentence Transformers, OpenAI Embeddings, Azure Text Embeddings
    RAG Pipelines: FAISS, Vector DB Integration, Clause-level Retrieval
    Azure AI Services: Azure OpenAI, Azure AI Search, Azure AI Studio, AI Foundry
    Conversational AI: Microsoft Bot Framework, LUIS, Prompt Engineering, Multi-turn Dialogs

    🧰 Deep Learning & CV
    Neural Networks: MLP, CNN, RNN, LSTM, Seq2Seq
    Applications: Image Classification, Object Detection (YOLOv3), GANs, Autoencoders
    Frameworks: TensorFlow, Keras, PyTorch

    πŸ§ͺ Tooling & Libraries
    Python Ecosystem: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, SciPy, OpenCV
    GenAI Tooling: LangChain, Transformers (HF), Whisper, Streamlit, FAISS
    MLOps: MLflow, Azure ML Pipelines, CI/CD for ML, Drift Monitoring

    ☁️ Cloud & Data Engineering
    Cloud: Azure (Databricks, Blob, OpenAI), AWS (S3, EC2), GCP
    Big Data: PySpark, Databricks Workflows
    Databases: MySQL, SQL Server, Azure Data Lake, Cosmos DB
    DataOps: Delta Lake, Data Factory, Azure Synapse

    πŸ“Š Visualization & BI
    BI Tools: Power BI, Tableau, Google Data Studio
    Dashboarding: DAX, Drill-through, Automated Reporting

    🌐 Web & Version Control
    Web Frameworks: Flask, Django, FastAPI
    Versioning & DevOps: Git, GitHub, Azure Repos, JIRA, Azure DevOps

  • Machine learning
  • Natural language processing
  • Cloud computing
  • Predictive analytics
  • Statistical modeling
  • Neural networks
  • Data visualization

Certification

  • Machine Learning A-Z: Python & R in Data Science – 2018-12
  • PGDM in Data Science from Imarticus – 2018-03
  • Deep Learning A-Z – 2019-03
  • Data Science Specialization- 2020-09
  • AI 900 – 2021-06
  • MLOps Specialization – 2022-01
  • Databricks GEN AI Associate 2025

Languages

English
First Language
Urdu
Proficient (C2)
C2
Kannada
Proficient (C2)
C2

Timeline

Lead Data Scientist

Hero Motocorp
11.2024 - Current

Data Science Lead

Tietoevry Create
11.2021 - 10.2024

Machine Learning Software Engineer

Accenture
11.2017 - 10.2021

Bachelor Of Engineering - Electronics and Communication Engineering

JSSATEB
Junaid Ahmed R