Summary
Overview
Work History
Education
Skills
Custom Section
Projects
Languages
Personal Information
Hobbies and Interests
Languages
Participations
Disclaimer
Timeline
Generic

Shastri Vaishampayan Mohapatra

Pune

Summary

Senior research scientist with expertise in natural language processing and computational linguistics, focusing on multilingual NLP model optimization and data-centric methodologies. Successfully architected synthetic data generation pipelines that enhance intent classification and improve entity extraction across multiple languages. Skilled in designing annotation quality assurance frameworks and conducting linguistic analysis to support robust AI systems. Proven leadership in mentoring teams and collaborating across ASR, NLU, and product systems to optimize speech-to-intent pipelines.

Overview

12
12
years of professional experience

Work History

Senior Research Scientist

Cerence AI
Pune, India
02.2025 - Current
  • Architected CFG-driven synthetic data generation pipelines to scale training corpora for transformer-based NLU models, improving intent classification and NER generalisation across multilingual (Hindi, English), automotive voice assistant interactions.
  • Led data-centric model optimisation cycles through fine-grained error analysis and metric-driven evaluation (F1, accuracy), resulting in measurable improvements in intent disambiguation, and entity extraction performance in production systems.
  • Designed and deployed annotation quality assurance frameworks using regex and rule-based validation, reducing schema violations, label noise, and strengthening training data reliability for large-scale NLP pipelines.
  • Conducted linguistically informed morphological analysis for Hindi, enabling the development of robust normalisation and subword tokenisation strategies aligned with transformer tokenisation schemes.
  • Engineered tokenisation and normalisation grammars to improve handling of noisy, code-mixed, and morphologically rich inputs, leading to enhanced NLU robustness and downstream interpretability.
  • Contributed to improving end-to-end speech-to-intent understanding by optimising text normalisation layers bridging the ASR output and downstream NLU components, resulting in better alignment between spoken input variability and structured intent parsing.
  • Collaborated across ASR, NLU, and product systems to optimise the speech-to-intent pipeline, ensuring alignment with real-world conversational distributions, long-tail queries, and edge-case behaviours.

Computational Linguist

TechMahindra BPS
09.2023 - 01.2025
  • Collaborated with the Linguistic Project Manager and NLP researchers on various projects, including:
  • Reviewing and annotating linguistic datasets to ensure data integrity for machine learning models.
  • Summarising outputs from Large Language Models (LLMs) for enhanced usability.
  • Providing linguistic analysis in the semantic, syntactic, and morphological domains.
  • Consulting on the development of linguistic databases to support advanced NLP applications.
  • Conducted research and analysis to address language-related challenges, improving AI system performance.
  • Developed and implemented linguistic models and algorithms to optimise language generation and enhance AI product quality.
  • Analysed AI system performance through rigorous testing, offering actionable insights for improvement.
  • Collaborated with data engineering teams to collect and annotate linguistic data, ensuring accuracy in training datasets.
  • Mentored junior linguists and data scientists, fostering skill development across projects.
  • Executed prompt annotation and evaluation for LLM projects using ServiceNow APIs.
  • Conducted sensitive data masking by recasting PII and using text normalisation and semantic transformation to train LLMs for automated dialogue generation, in line with Ethical AI and Responsible AI principles.

Linguist

Saarthi.AI
11.2022 - 09.2023
  • Worked on building Phonemizer (a mapping tool that maps Odia phonemes with IPA) for Odia Language
  • Responsible for localization, and quality data production for NLU, TTS, and ASR for AI Voice Bot, and WhatsApp chatbot for English, Hindi, and Odia in the domain of BFSI
  • Worked on Mind-map for creating the Bot response Flow for voice Bot, and WhatsApp chatbot
  • Worked on Conversation Design aimed at best customer experience
  • Domain: BFSI (Debt Collection, Sales & Onboarding)
  • Leadership Role: Domain specific POC for Quality Analysis of the data produced by the team members, and team management.

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd
07.2021 - 10.2022
  • Translation, Editing, and Review (English Oriya), (Hindi English)
  • QC and QA for Translation and Voice Over (English, Hindi, Oriya)
  • Domain: Academic, Medical & Health, Industry, App, News, etc.

Research Assistant

International Institute of Information Technology-Hyderabad
Hyderabad
07.2014 - 05.2022

Development of a Rule-based Multi-lingual Parser and Machine Translation System


  • Description: Designed a multi-lingual parser and MT system for translating Hindi into Oriya, English, Tamil, German, and Japanese.
  • Key Contributions: Built essential resources, including a Transfer Grammar module, Mapper, and a Multilingual Concept Dictionary for accurate MT generation. Led semantic analysis, annotation, and disambiguation tasks to develop a disambiguated Universal Semantic Representation (USR) for improved translation quality.

Language Specialist (Freelancer)

Innoactive Intelligence LLP
11.2017 - 12.2018

Synthesis of Data for Conversational AI

  • Developed and curated high-quality synthetic Hindi language datasets to enhance the performance of AI-driven chatbots in the BFSI sector, ensuring accurate and contextually relevant customer interactions.

Gold Data Creation for Question Data Disambiguation

  • Curated gold-standard data to refine the disambiguation process of user queries, significantly enhancing the chatbot’s ability to accurately interpret and respond to complex inquiries in Hindi.

Creation of Synthesized and Gold Question-Answer Data for Banking Chatbots

  • Designed and implemented a robust question-answer (QA) dataset, combining both synthesized and gold data specifically for banking chatbots. This approach facilitated seamless customer engagement and efficient resolution of banking-related queries in the BFSI domain.

Freelancer

Appen
05.2016 - 06.2017
  • Oriya, Hindi, Sanskrit Transcription, disambiguation, Transliteration, Quality Evaluation
  • Indian English pronunciation rating.

Education

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology
Hyderabad, Telangana
05-2022

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth
Tirupati, Andhra Pradesh
05-2010

Intermediate -

Rashtriya Sanskrit Vidyapeeth
Tirupati, Andhra Pradesh
05-2007

M.A (Applied Linguistics) -

University of Hyderabad
Hyderabad, Telangana
04-2012

Matriculation -

Mathasahi High school
Mathasahi, Puri
03-2003

Skills

  • NLP, Computational Linguistics, NLU (Intent Classification, NER)
  • Transformer-based NLP, RAG, Machine Translation
  • Conversational AI, ASR–NLU Integration, Speech-to-Intent Systems
  • Data-centric AI: annotation, synthetic data (CFG), data curation, and QA
  • Linguistic Analysis, Multilingual NLP (Hindi, English, Odia), Code-Mixed Processing, Localisation, & Translation
  • Tokenisation & Text Normalisation (ASR handling, subword techniques)
  • Model Evaluation and Error Analysis (F1, Accuracy)
  • End-to-end pipelines (ASR → NLU), long-tail handling
  • Research Methodology

Custom Section

Technical Skills

Python, Panda, Bash, Regex, Git, JIRA, JavaScript, SQL

ServiceNow API, Translation memory tools (TDS, MemoQ)

Unix, MS Office

Projects

Complex Predicate Analysis in Oriya

Description: Conducted in-depth research based on Beth Levin's verb classification and Talmy's Lexicalization Patterns. Explored the syntactic and semantic properties of Psych and Motion verbs in Oriya.

Enhancement of Anusaaraka Machine Translation System (English-Hindi)

Description: Focused on identifying templates and patterns in Hindi and English using text simplification algorithms and tokenization. Developed linguistic rules using Python, NLTK, and CLIPS to enhance translation accuracy.

Development of Multi-lingual Machine Translation (MT) Tools

Languages Covered: English, Hindi, Oriya, Punjabi, Sanskrit, Marathi, Japanese, German

Description: Developed MT tools to handle multiple languages, addressing idiomatic expressions and semantic structures.

Mapped semantic representations across languages using morpho-syntactic and grammar transformation rules.

Utilized Grammatical Framework (GF), Python, HTML, and CLIPS to create robust language processing solutions.

Authoring Tool Development for Hindi-English Translation

Description: Designed and developed a tool to enhance Hindi-English translation by enabling users to verify semantically disambiguated data through an interactive question-answer system, improving machine translation output.

Tools used: Python, HTML, CLIPS.

Field Linguistics Research

Description: Led field research focusing on the Kui language (spoken near Vishakhapatnam, Andhra Pradesh), conducting comprehensive linguistic analysis covering morpho-syntactic, semantic, and phonological aspects.

Languages

Odia
First Language
English
Proficient
C2
Hindi
Proficient
C2
Bengali
Upper Intermediate
B2
Sanskrit
Proficient
C2

Personal Information

  • DOB: 09/03/1988
  • Nationality: Indian
  • Marital status: Married

Hobbies and Interests

Research (Computational Linguistics & NLP), learning new languages, reading, and creative writing

Languages

6,C2,2,A2,6,C2,6,C2

Participations

  • Attended 'Applying Paninian Framework for semantic analysis of Indo-Europian Languages' organized by Anusaaraka Lab, LTRC, International Institute of Information Technology, Hyderabad, from 5th - 7th April, 2018
  • Attended 'Faculty of Language: Design & Interfaces' organized by the Department of Humanities and Social Sciences, Indian Institute of Technology, Delhi, from 11th - 12th February, 2013
  • Attended 'International Workshop on Syntax' organized by CALTS, School of Humanities - University of Hyderabad and CIIL- Mysore, from 26th February - 4th March, 2012
  • Attended 'Research Methodology Workshop in Linguistics and Translation Studies' organized by CALTS, School of Humanities, University of Hyderabad, from 10th - 11th February, 2012
  • Attended 5th Students' Conference of Linguistics in India organized by CALTS, School of Humanities, University of Hyderabad, from 21st - 23rd February, 2011

Disclaimer

I hereby declare that all the details given above are true to the best of my knowledge and belief. I am also affirming that I am very keen on my work as well.

Timeline

Senior Research Scientist

Cerence AI
02.2025 - Current

Computational Linguist

TechMahindra BPS
09.2023 - 01.2025

Linguist

Saarthi.AI
11.2022 - 09.2023

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd
07.2021 - 10.2022

Language Specialist (Freelancer)

Innoactive Intelligence LLP
11.2017 - 12.2018

Freelancer

Appen
05.2016 - 06.2017

Research Assistant

International Institute of Information Technology-Hyderabad
07.2014 - 05.2022

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth

Intermediate -

Rashtriya Sanskrit Vidyapeeth

M.A (Applied Linguistics) -

University of Hyderabad

Matriculation -

Mathasahi High school
Shastri Vaishampayan Mohapatra