Summary

Overview

Work History

Education

Skills

Custom Section

Projects

Languages

Personal Information

Hobbies and Interests

Languages

Participations

Disclaimer

Timeline

Shastri Vaishampayan Mohapatra

Pune

Summary

Senior research scientist with expertise in natural language processing and computational linguistics, focusing on multilingual NLP model optimization and data-centric methodologies. Successfully architected synthetic data generation pipelines that enhance intent classification and improve entity extraction across multiple languages. Skilled in designing annotation quality assurance frameworks and conducting linguistic analysis to support robust AI systems. Proven leadership in mentoring teams and collaborating across ASR, NLU, and product systems to optimize speech-to-intent pipelines.

Overview

years of professional experience

Work History

Senior Research Scientist

Cerence AI

Pune, India

02.2025 - Current

Architected CFG-driven synthetic data generation pipelines to scale training corpora for transformer-based NLU models, improving intent classification and NER generalisation across multilingual (Hindi, English), automotive voice assistant interactions.
Led data-centric model optimisation cycles through fine-grained error analysis and metric-driven evaluation (F1, accuracy), resulting in measurable improvements in intent disambiguation, and entity extraction performance in production systems.
Designed and deployed annotation quality assurance frameworks using regex and rule-based validation, reducing schema violations, label noise, and strengthening training data reliability for large-scale NLP pipelines.
Conducted linguistically informed morphological analysis for Hindi, enabling the development of robust normalisation and subword tokenisation strategies aligned with transformer tokenisation schemes.
Engineered tokenisation and normalisation grammars to improve handling of noisy, code-mixed, and morphologically rich inputs, leading to enhanced NLU robustness and downstream interpretability.
Contributed to improving end-to-end speech-to-intent understanding by optimising text normalisation layers bridging the ASR output and downstream NLU components, resulting in better alignment between spoken input variability and structured intent parsing.
Collaborated across ASR, NLU, and product systems to optimise the speech-to-intent pipeline, ensuring alignment with real-world conversational distributions, long-tail queries, and edge-case behaviours.

Computational Linguist

TechMahindra BPS

09.2023 - 01.2025

Collaborated with the Linguistic Project Manager and NLP researchers on various projects, including:
Reviewing and annotating linguistic datasets to ensure data integrity for machine learning models.
Summarising outputs from Large Language Models (LLMs) for enhanced usability.
Providing linguistic analysis in the semantic, syntactic, and morphological domains.
Consulting on the development of linguistic databases to support advanced NLP applications.
Conducted research and analysis to address language-related challenges, improving AI system performance.
Developed and implemented linguistic models and algorithms to optimise language generation and enhance AI product quality.
Analysed AI system performance through rigorous testing, offering actionable insights for improvement.
Collaborated with data engineering teams to collect and annotate linguistic data, ensuring accuracy in training datasets.
Mentored junior linguists and data scientists, fostering skill development across projects.
Executed prompt annotation and evaluation for LLM projects using ServiceNow APIs.
Conducted sensitive data masking by recasting PII and using text normalisation and semantic transformation to train LLMs for automated dialogue generation, in line with Ethical AI and Responsible AI principles.

Linguist

Saarthi.AI

11.2022 - 09.2023

Worked on building Phonemizer (a mapping tool that maps Odia phonemes with IPA) for Odia Language
Responsible for localization, and quality data production for NLU, TTS, and ASR for AI Voice Bot, and WhatsApp chatbot for English, Hindi, and Odia in the domain of BFSI
Worked on Mind-map for creating the Bot response Flow for voice Bot, and WhatsApp chatbot
Worked on Conversation Design aimed at best customer experience
Domain: BFSI (Debt Collection, Sales & Onboarding)
Leadership Role: Domain specific POC for Quality Analysis of the data produced by the team members, and team management.

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd

07.2021 - 10.2022

Translation, Editing, and Review (English Oriya), (Hindi English)
QC and QA for Translation and Voice Over (English, Hindi, Oriya)
Domain: Academic, Medical & Health, Industry, App, News, etc.

Research Assistant

International Institute of Information Technology-Hyderabad

Hyderabad

07.2014 - 05.2022

Development of a Rule-based Multi-lingual Parser and Machine Translation System

Description: Designed a multi-lingual parser and MT system for translating Hindi into Oriya, English, Tamil, German, and Japanese.
Key Contributions: Built essential resources, including a Transfer Grammar module, Mapper, and a Multilingual Concept Dictionary for accurate MT generation. Led semantic analysis, annotation, and disambiguation tasks to develop a disambiguated Universal Semantic Representation (USR) for improved translation quality.

Language Specialist (Freelancer)

Innoactive Intelligence LLP

11.2017 - 12.2018

Synthesis of Data for Conversational AI

Developed and curated high-quality synthetic Hindi language datasets to enhance the performance of AI-driven chatbots in the BFSI sector, ensuring accurate and contextually relevant customer interactions.

Gold Data Creation for Question Data Disambiguation

Curated gold-standard data to refine the disambiguation process of user queries, significantly enhancing the chatbot’s ability to accurately interpret and respond to complex inquiries in Hindi.

Creation of Synthesized and Gold Question-Answer Data for Banking Chatbots

Designed and implemented a robust question-answer (QA) dataset, combining both synthesized and gold data specifically for banking chatbots. This approach facilitated seamless customer engagement and efficient resolution of banking-related queries in the BFSI domain.

Freelancer

Appen

05.2016 - 06.2017

Oriya, Hindi, Sanskrit Transcription, disambiguation, Transliteration, Quality Evaluation
Indian English pronunciation rating.

Education

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology

Hyderabad, Telangana

05-2022

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth

Tirupati, Andhra Pradesh

05-2010

Intermediate -

Rashtriya Sanskrit Vidyapeeth

Tirupati, Andhra Pradesh

05-2007

M.A (Applied Linguistics) -

University of Hyderabad

Hyderabad, Telangana

04-2012

Matriculation -

Mathasahi High school

Mathasahi, Puri

03-2003

Skills

NLP, Computational Linguistics, NLU (Intent Classification, NER)
Transformer-based NLP, RAG, Machine Translation
Conversational AI, ASR–NLU Integration, Speech-to-Intent Systems
Data-centric AI: annotation, synthetic data (CFG), data curation, and QA
Linguistic Analysis, Multilingual NLP (Hindi, English, Odia), Code-Mixed Processing, Localisation, & Translation

Tokenisation & Text Normalisation (ASR handling, subword techniques)
Model Evaluation and Error Analysis (F1, Accuracy)
End-to-end pipelines (ASR → NLU), long-tail handling
Research Methodology

Custom Section

Technical Skills

Python, Panda, Bash, Regex, Git, JIRA, JavaScript, SQL

ServiceNow API, Translation memory tools (TDS, MemoQ)

Unix, MS Office

Projects

Complex Predicate Analysis in Oriya

Description: Conducted in-depth research based on Beth Levin's verb classification and Talmy's Lexicalization Patterns. Explored the syntactic and semantic properties of Psych and Motion verbs in Oriya.

Enhancement of Anusaaraka Machine Translation System (English-Hindi)

Description: Focused on identifying templates and patterns in Hindi and English using text simplification algorithms and tokenization. Developed linguistic rules using Python, NLTK, and CLIPS to enhance translation accuracy.

Development of Multi-lingual Machine Translation (MT) Tools

Languages Covered: English, Hindi, Oriya, Punjabi, Sanskrit, Marathi, Japanese, German

Description: Developed MT tools to handle multiple languages, addressing idiomatic expressions and semantic structures.

Mapped semantic representations across languages using morpho-syntactic and grammar transformation rules.

Utilized Grammatical Framework (GF), Python, HTML, and CLIPS to create robust language processing solutions.

Authoring Tool Development for Hindi-English Translation

Description: Designed and developed a tool to enhance Hindi-English translation by enabling users to verify semantically disambiguated data through an interactive question-answer system, improving machine translation output.

Tools used: Python, HTML, CLIPS.

Field Linguistics Research

Description: Led field research focusing on the Kui language (spoken near Vishakhapatnam, Andhra Pradesh), conducting comprehensive linguistic analysis covering morpho-syntactic, semantic, and phonological aspects.

Languages

Odia

First Language

English

Proficient

Hindi

Proficient

Bengali

Upper Intermediate

Sanskrit

Proficient

Personal Information

DOB: 09/03/1988
Nationality: Indian
Marital status: Married

Hobbies and Interests

Research (Computational Linguistics & NLP), learning new languages, reading, and creative writing

Languages

6,C2,2,A2,6,C2,6,C2

Participations

Attended 'Applying Paninian Framework for semantic analysis of Indo-Europian Languages' organized by Anusaaraka Lab, LTRC, International Institute of Information Technology, Hyderabad, from 5th - 7th April, 2018
Attended 'Faculty of Language: Design & Interfaces' organized by the Department of Humanities and Social Sciences, Indian Institute of Technology, Delhi, from 11th - 12th February, 2013
Attended 'International Workshop on Syntax' organized by CALTS, School of Humanities - University of Hyderabad and CIIL- Mysore, from 26th February - 4th March, 2012
Attended 'Research Methodology Workshop in Linguistics and Translation Studies' organized by CALTS, School of Humanities, University of Hyderabad, from 10th - 11th February, 2012
Attended 5th Students' Conference of Linguistics in India organized by CALTS, School of Humanities, University of Hyderabad, from 21st - 23rd February, 2011

Disclaimer

I hereby declare that all the details given above are true to the best of my knowledge and belief. I am also affirming that I am very keen on my work as well.

Timeline

Senior Research Scientist

Cerence AI

02.2025 - Current

Computational Linguist

TechMahindra BPS

09.2023 - 01.2025

Linguist

Saarthi.AI

11.2022 - 09.2023

Translator/Editor/Reviewer

Knowledgeworks Innovative Linguistic Solutions Pvt. Ltd

07.2021 - 10.2022

Language Specialist (Freelancer)

Innoactive Intelligence LLP

11.2017 - 12.2018

Freelancer

Appen

05.2016 - 06.2017

Research Assistant

International Institute of Information Technology-Hyderabad

07.2014 - 05.2022

Research Assistant (Ph.D. course works) - Computational Linguistics

International Institute of Information Technology

Sastri (Eq. to B.A.) (Sanskrit) -

Rashtriya Sanskrit Vidyapeeth

Intermediate -

Rashtriya Sanskrit Vidyapeeth

M.A (Applied Linguistics) -

University of Hyderabad

Matriculation -

Mathasahi High school

Summary

Overview

Work History

Senior Research Scientist

Computational Linguist

Linguist

Translator/Editor/Reviewer

Research Assistant

Language Specialist (Freelancer)

Freelancer

Education

Research Assistant (Ph.D. course works) - Computational Linguistics

Sastri (Eq. to B.A.) (Sanskrit) -

Intermediate -

M.A (Applied Linguistics) -

Matriculation -

Skills

Custom Section

Projects

Languages

Personal Information

Hobbies and Interests

Languages

Participations

Disclaimer

Timeline

Senior Research Scientist

Computational Linguist

Linguist

Translator/Editor/Reviewer

Language Specialist (Freelancer)

Freelancer

Research Assistant

Research Assistant (Ph.D. course works) - Computational Linguistics

Sastri (Eq. to B.A.) (Sanskrit) -

Intermediate -

M.A (Applied Linguistics) -

Matriculation -

Similar Profiles

Shastri Vaishampayan MohapatraShastri Vaishampayan Mohapatra

Abhishek SinghAbhishek Singh

Puspanjali SharmaPuspanjali Sharma

Rakesh SachdevaRakesh Sachdeva