Summary
Overview
Work History
Education
Skills
Technical Skills
Projects
Personal Information
Languages
Hobbies and Interests
Participations
Disclaimer
Timeline
Generic

Shastri Vaishampayan Mohapatra

Summary

A Senior Research Scientist specializing in Natural Language Processing (NLP) and Computational Linguistics, with experience in multilingual NLU and conversational AI systems. Focused on data-centric model optimization, including synthetic data generation and annotation quality frameworks to improve intent classification and entity extraction. Strong foundation in linguistic analysis and language modeling, with experience collaborating across ASR–NLU pipelines to enhance end-to-end speech understanding.

Overview

12
12
years of professional experience

Work History

Senior Research Scientist

Cerence AI
Pune, India
02.2025 - Current
  • Architected CFG-driven synthetic data generation pipelines to scale training corpora for transformer-based NLU models, improving intent classification and NER generalisation across multilingual (Hindi, English), automotive voice assistant interactions.
  • Led data-centric model optimisation cycles through fine-grained error analysis and metric-driven evaluation (F1, accuracy), resulting in measurable improvements in intent disambiguation, and entity extraction performance in production systems.
  • Designed and deployed annotation quality assurance frameworks using regex and rule-based validation, reducing schema violations, label noise, and strengthening training data reliability for large-scale NLP pipelines.
  • Conducted linguistically informed morphological analysis for Hindi, enabling the development of robust normalisation and subword tokenisation strategies aligned with transformer tokenisation schemes.
  • Engineered tokenisation and normalisation grammars to improve handling of noisy, code-mixed, and morphologically rich inputs, leading to enhanced NLU robustness and downstream interpretability.
  • Contributed to improving end-to-end speech-to-intent understanding by optimising text normalisation layers bridging the ASR output and downstream NLU components, resulting in better alignment between spoken input variability and structured intent parsing.
  • Collaborated across ASR, NLU, and product systems to optimise the speech-to-intent pipeline, ensuring alignment with real-world conversational distributions, long-tail queries, and edge-case behaviours.

Computational Linguist

TechMahindra BPS
09.2023 - 01.2025
  • Collaborated on LLM and Generative AI projects, focusing on dataset curation, annotation, and linguistic analysis for conversational systems.
  • Performed LLM evaluation, prompt annotation, and response summarisation to improve generation quality and conversational coherence.
  • Contributed to linguistic resources and data pipelines supporting conversational AI and language generation models.
  • Conducted error analysis and system evaluation, driving improvements in model performance and reliability.
  • Partnered on high-quality dataset creation and QA for GenAI systems, ensuring robust training data.
  • Mentored junior linguists on annotation standards and LLM evaluation practices.
  • Applied PII masking and text normalisation aligned with Responsible and Ethical AI principles.

Linguist

Saarthi.AI
Bengaluru, India
11.2022 - 09.2023
  • Developed a phonemiser for Odia, mapping native phonemes to IPA to support ASR, TTS accuracy, and speech processing pipelines.
  • Contributed to conversational AI systems (voice bots and WhatsApp chatbots) by delivering high-quality multilingual data (English, Hindi, Odia) for NLU, ASR, and TTS in the BFSI domain.
  • Designed conversation flows and mind maps, optimising dialogue structure and user experience across voice and chat interfaces.
  • Applied conversation design principles to enhance customer experience in use cases such as debt collection, sales, and onboarding.
  • Served as domain-specific POC for data quality, overseeing quality analysis, team coordination, and output consistency across annotation workflows.

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd
07.2021 - 10.2022
  • Performed translation, editing, and review (English–Oriya, Hindi–English), delivering client-facing, production-quality content across academic, healthcare, industry, application, and news domains.
  • Conducted QC/QA and evaluation of translation and voice-over outputs, including MT/NMT systems, ensuring linguistic accuracy, domain adaptation, and consistency.
  • Guided reviewers and enforced quality standards and best practices, ensuring high-quality, domain-aligned deliverables across diverse content types.

Research Assistant

International Institute of Information Technology
Hyderabad
07.2014 - 05.2022

Development of a rule-based multi-lingual parser and machine translation system.

  • Designed a rule-based multilingual parsing and machine translation system for translating Hindi into Oriya, English, Tamil, German, and Japanese.
  • Developed core linguistic resources, including a transfer grammar module, semantic mapper, and multilingual concept lexicon, enabling structured cross-lingual generation.
  • Conducted semantic analysis, annotation, and disambiguation to build a Universal Semantic Representation (USR), improving translation consistency and accuracy.

Language Specialist (Freelancer)

Innoactive Intelligence LLP
11.2017 - 12.2018
  • Developed and curated high-quality synthetic Hindi datasets for Conversational AI (chatbots) in the BFSI domain, improving intent understanding and response relevance.
  • Created gold-standard datasets for query disambiguation, enhancing the chatbot’s ability to accurately interpret complex, ambiguous user inputs.
  • Designed hybrid (synthetic + gold) QA datasets for banking chatbots, enabling robust conversational performance and efficient query resolution.

Freelancer

Appen
05.2016 - 06.2017
  • Worked on transcription and transliteration for Oriya, Hindi, and Sanskrit, supporting accurate text representation and basic disambiguation tasks.
  • Performed quality evaluation of Indian English pronunciation, including phonetic analysis for consistency and correctness.

Education

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology
05-2022

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth
05-2010

M.A (Applied Linguistics) -

University of Hyderabad
04-2012

Skills

  • NLP, Computational Linguistics, NLU (Intent Classification, NER)
  • Transformer-based NLP, RAG, Machine Translation
  • Conversational AI, ASR–NLU Integration, Speech-to-Intent Systems
  • Data-centric AI: annotation, synthetic data (CFG), data curation, and QA
  • Linguistic Analysis, Multilingual NLP (Hindi, English, Odia), Code-Mixed Processing, Localisation, & Translation
  • Tokenisation & Text Normalisation (ASR handling, subword techniques)
  • Model Evaluation and Error Analysis (F1, Accuracy)
  • End-to-end pipelines (ASR → NLU), long-tail handling
  • Research Methodology

Technical Skills

Python, Panda, Bash, Regex, Git, JIRA, JavaScript, SQL

ServiceNow API, Translation memory tools (TDS, MemoQ)

Unix, MS Office

Projects

Complex Predicate Analysis in Oriya

  • Conducted linguistic analysis of complex predicates using verb classification and lexicalization frameworks, focusing on syntactic and semantic behavior of psych and motion verbs

Enhancement of Anusaaraka Machine Translation System (English-Hindi)

  • Improved rule-based MT accuracy through cross-lingual pattern identification; developed text simplification, tokenization pipelines, and linguistic rules using Python, NLTK, and CLIPS

Development of Multi-lingual Machine Translation (MT) Tools

  • Built multilingual MT solutions across 8+ languages, designing morpho-syntactic transformation rules and leveraging GF, Python, and CLIPS for semantic mapping and handling structural variations

Authoring Tool Development for Hindi-English Translation

  • Developed an interactive tool for semantic disambiguation with user-driven validation workflows, improving translation reliability (Python, HTML, CLIPS)

Field Linguistics Research

  • Contributed to field research on the Kui language, analyzing morpho-syntactic, semantic, and phonological structures to support low-resource language documentation

Personal Information

Languages

Odia
Native language
English
Bilingual or Proficient (C2)
Hindi
Bilingual or Proficient (C2)
Bengali
Upper intermediate (B2)
Sanskrit
Bilingual or Proficient (C2)

Hobbies and Interests

Research (Computational Linguistics & NLP), learning new languages, reading, and creative writing

Participations

  • Workshop on Applying Paninian Framework for Semantic Analysis of Indo-European Languages, Anusaaraka Lab, LTRC, IIIT Hyderabad (Apr 2018)
  • Workshop on Faculty of Language: Design & Interfaces, IIT Delhi (Feb 2013)
  • International Workshop on Syntax, CALTS – University of Hyderabad & CIIL Mysore (Feb–Mar 2012)
  • Workshop on Research Methodology in Linguistics & Translation Studies, CALTS – University of Hyderabad (Feb 2012)
  • 5th Students’ Conference of Linguistics in India, CALTS – University of Hyderabad (Feb 2011)

Disclaimer

I hereby declare that all the details given above are true to the best of my knowledge and belief. I am also affirming that I am very keen on my work as well.

Timeline

Senior Research Scientist

Cerence AI
02.2025 - Current

Computational Linguist

TechMahindra BPS
09.2023 - 01.2025

Linguist

Saarthi.AI
11.2022 - 09.2023

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd
07.2021 - 10.2022

Language Specialist (Freelancer)

Innoactive Intelligence LLP
11.2017 - 12.2018

Freelancer

Appen
05.2016 - 06.2017

Research Assistant

International Institute of Information Technology
07.2014 - 05.2022

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth

M.A (Applied Linguistics) -

University of Hyderabad
Shastri Vaishampayan Mohapatra