Anushka Saha

Tech Mahindra

12.2023 - Current

AI Agent & GenAI Development

Architected and deployed production-ready AI agents using ServiceNow AI Agent Studio, designing multi-platform prompt engineering solutions across NowLLM, Azure OpenAI, Google Gemini, and AWS Claude for enterprise compliance automation
Led end-to-end development of GenAI skills including AI Control Objective Updater and Citation Impact Analyzer, implementing advanced prompt techniques (role priming, chain-of-thought reasoning, JSON schema enforcement) with cross-platform compatibility
Designed agentic workflows and conversational process flows, defining agent behaviors, tool orchestration, and context management for autonomous compliance document processing
Integrated and configured GenAI tools with sophisticated multi-step reasoning capabilities, enabling seamless chaining of operations within AI agents
Managed full development lifecycle of AI-powered products from conceptualization to production readiness, serving as primary liaison between prompt engineering, AI agent configuration, and domain experts

Large Language Model Development & Optimization

Engineered and refined LLMs through fine-tuning and prompt engineering for text summarization, knowledge base generation, and conversational AI applications
Developed comprehensive evaluation frameworks implementing 5-point scale metrics for faithfulness, completeness, and hallucination detection across multiple summarization tasks
Created golden benchmark datasets for training NowLLM's summarization capabilities, covering KB generation, chat summarization, case resolution notes, and QnA systems
Conducted systematic hallucination analysis, categorizing errors into extrinsic, intrinsic, and misattribution types to improve model factuality and reduce confabulations
Designed and executed iterative prompt engineering strategies to optimize performance for regulatory alert summarization and sensitive data recasting

Document Intelligence & NLP Engineering

Spearheaded document intelligence initiatives as Lead Linguist, creating annotation schemas for domain-specific entity extraction (billing periods, consumption units, payer/payee identification) from invoices and utility bills
Trained and optimized OCR models through gold-standard corpora development, implementing token boundary detection, POS tagging, and morphological feature annotation for improved document understanding
Applied semantic segmentation techniques to classify document structures (headers, paragraphs, footnotes) in scanned PDFs, legal documents, and historical manuscripts using NLP-based inference methods
Implemented advanced NLP techniques for text classification, named entity recognition (NER), and document structure inference to enhance document intelligence platform performance

AI Model Evaluation & Quality Assurance

Led evaluation of LLM-generated content against key metrics including faithfulness, completeness, and correctness using both automated and manual analysis to identify and mitigate model hallucinations
Designed structured evaluation rubrics for assessing AI-generated summaries and conversational responses, providing detailed linguistic rationales for improvements
Identified and rectified bugs within automated evaluation metrics through fine-grained error analysis, significantly enhancing model self-assessment accuracy and reliability
Verified AI model performance by comparing outputs with human-labeled gold standard benchmarks, ensuring models met performance and safety standards before deployment
Implemented feedback-driven continuous learning pipelines, improving predictive accuracy by 35%

Data Annotation & Quality Management

Led cross-functional annotation teams of 10+ linguists, maintaining inter-annotator agreement above 85% through standardized guidelines and continuous training programs
Annotated 10,000+ documents for entity extraction, text classification, and semantic role labeling, creating high-quality training data for DocIntel and IRM products
Implemented rigorous PII redaction protocols using centralized dictionary systems, paraphrasing sensitive data while preserving semantic integrity for GDPR-compliant model training
Authored comprehensive annotation schemas and guidelines for diverse datasets including OCR text corpora, domain-specific entities, and multi-turn dialogues

Prompt Engineering & Optimization

Engineered domain-specific prompts for regulatory compliance automation, processing 200+ regulatory alerts with optimized faithfulness and completeness scores
Developed and refined master prompts for auto-recast models, achieving parity with manual recasting through iterative testing and linguistic guardrail implementation
Created prompt schemas incorporating user personas, output format requirements, and regulatory context for enhanced summarization accuracy
Spearheaded development of client-facing auto-recasting model applying NLU and NLG methods to mask PII while preserving semantic integrity

Conversational AI & Dialog Systems

Evaluated multi-turn conversation quality across four key metrics (groundedness, helpfulness, fluency, forward movement) for enterprise chat agents in HR, IT helpdesk, and healthcare domains
Assessed customer satisfaction (CSAT) for bot-human interactions on a 5-point scale, identifying optimization opportunities in issue resolution efficiency and agent empathy
Designed discourse coherence frameworks for maintaining context across dialog turns, improving semantic entailment and pragmatic intent understanding

Knowledge Management Systems

Built similarity-based KB recommendation systems comparing AI Search vs. predictive models for ITSM first-call resolution improvement
Identified and labeled KB-able cases from resolved incidents, creating golden knowledge articles for automated KB generation
Validated KB article quality through hallucination and completeness metrics, reducing incident investigation time by 40%

Cross-functional Collaboration & Innovation

Collaborated with engineering and product teams to refine ServiceNow's auto-evaluation models, identifying and resolving false negative bugs in judge prompts
Served as linguistic consultant for NLU/NLG method implementation, validating model performance through systematic data analysis
Coordinated with compliance teams to ensure AI outputs met regulatory standards and audit requirements
Pioneered automated evaluation metrics using faithfulness, correctness, and linguistic precision scoring for scalable quality assessment
Created technical documentation bridging implementation with business requirements, authored client-facing guidelines for auto-recast models and AI evaluation tools

Summary