Machine Translation Evaluation for Low-Resource Indian Languages Jul 2023 – May 2025
Analyzed linguistic diversity and data gaps for Assamese, Kannada, Maithili, and Punjabi, and developed a benchmark dataset with 1,000 MQM annotations for zero-shot MT evaluation.
Enhanced COMET models via Indic fine-tuning and replacing XLM-RoBERTa with IndicBERTv2, improving correlation with human judgments.
Generated 100k+ synthetic sentences to boost multi-stage training strategies and improve model performance on low-resource languages.
Adapted GEMBA-MQM (GPT-4) and leveraged LLMs like GEMMA and LLaMA for reference-free, zero-shot evaluation, creating diverse synthetic data to enhance error detection and correction in MT outputs.
Positions of Responsibility
PG Placement Coordinator, Training and Placement Cell, IIT Madras (2022–25)
Sports Secretary, Tunga Hostel, IIT Madras (2023–24)
Publications
How Good is Zero-Shot MT Evaluation for Low-Resource Indian Languages? (ACL 2024) Anushka Singh, Ananya B. Sai, Raj Dabre, Ratish Puduppully, Anoop K., Mitesh Khapra
Quality Estimation and Post-Editing Using LLMs For Indic Languages: How Good Is It? (MT Summit 2025) Anushka Singh, Aarya Pakhale, Mitesh Khapra, Raj Dabre
Timeline
Data Scientist-I
SIXT Research and Development
06.2025 - Current
Data Science Intern
SIXT Research and Development
03.2025 - 05.2025
Bachelor of Technology - Computer Science and Engineering
Madhav Institute of Technology and Science, Gwalior
Master of Science by Research - Computer Science and Engineering