AI Agent & GenAI Development
- Architected and deployed production-ready AI agents using ServiceNow AI Agent Studio, designing multi-platform prompt engineering solutions across NowLLM, Azure OpenAI, Google Gemini, and AWS Claude for enterprise compliance automation
- Led end-to-end development of GenAI skills including AI Control Objective Updater and Citation Impact Analyzer, implementing advanced prompt techniques (role priming, chain-of-thought reasoning, JSON schema enforcement) with cross-platform compatibility
- Designed agentic workflows and conversational process flows, defining agent behaviors, tool orchestration, and context management for autonomous compliance document processing
- Integrated and configured GenAI tools with sophisticated multi-step reasoning capabilities, enabling seamless chaining of operations within AI agents
- Managed full development lifecycle of AI-powered products from conceptualization to production readiness, serving as primary liaison between prompt engineering, AI agent configuration, and domain experts
Large Language Model Development & Optimization
- Engineered and refined LLMs through fine-tuning and prompt engineering for text summarization, knowledge base generation, and conversational AI applications
- Developed comprehensive evaluation frameworks implementing 5-point scale metrics for faithfulness, completeness, and hallucination detection across multiple summarization tasks
- Created golden benchmark datasets for training NowLLM's summarization capabilities, covering KB generation, chat summarization, case resolution notes, and QnA systems
- Conducted systematic hallucination analysis, categorizing errors into extrinsic, intrinsic, and misattribution types to improve model factuality and reduce confabulations
- Designed and executed iterative prompt engineering strategies to optimize performance for regulatory alert summarization and sensitive data recasting
Document Intelligence & NLP Engineering
- Spearheaded document intelligence initiatives as Lead Linguist, creating annotation schemas for domain-specific entity extraction (billing periods, consumption units, payer/payee identification) from invoices and utility bills
- Trained and optimized OCR models through gold-standard corpora development, implementing token boundary detection, POS tagging, and morphological feature annotation for improved document understanding
- Applied semantic segmentation techniques to classify document structures (headers, paragraphs, footnotes) in scanned PDFs, legal documents, and historical manuscripts using NLP-based inference methods
- Implemented advanced NLP techniques for text classification, named entity recognition (NER), and document structure inference to enhance document intelligence platform performance
AI Model Evaluation & Quality Assurance
- Led evaluation of LLM-generated content against key metrics including faithfulness, completeness, and correctness using both automated and manual analysis to identify and mitigate model hallucinations
- Designed structured evaluation rubrics for assessing AI-generated summaries and conversational responses, providing detailed linguistic rationales for improvements
- Identified and rectified bugs within automated evaluation metrics through fine-grained error analysis, significantly enhancing model self-assessment accuracy and reliability
- Verified AI model performance by comparing outputs with human-labeled gold standard benchmarks, ensuring models met performance and safety standards before deployment
- Implemented feedback-driven continuous learning pipelines, improving predictive accuracy by 35%
Data Annotation & Quality Management
- Led cross-functional annotation teams of 10+ linguists, maintaining inter-annotator agreement above 85% through standardized guidelines and continuous training programs
- Annotated 10,000+ documents for entity extraction, text classification, and semantic role labeling, creating high-quality training data for DocIntel and IRM products
- Implemented rigorous PII redaction protocols using centralized dictionary systems, paraphrasing sensitive data while preserving semantic integrity for GDPR-compliant model training
- Authored comprehensive annotation schemas and guidelines for diverse datasets including OCR text corpora, domain-specific entities, and multi-turn dialogues
Prompt Engineering & Optimization
- Engineered domain-specific prompts for regulatory compliance automation, processing 200+ regulatory alerts with optimized faithfulness and completeness scores
- Developed and refined master prompts for auto-recast models, achieving parity with manual recasting through iterative testing and linguistic guardrail implementation
- Created prompt schemas incorporating user personas, output format requirements, and regulatory context for enhanced summarization accuracy
- Spearheaded development of client-facing auto-recasting model applying NLU and NLG methods to mask PII while preserving semantic integrity
Conversational AI & Dialog Systems
- Evaluated multi-turn conversation quality across four key metrics (groundedness, helpfulness, fluency, forward movement) for enterprise chat agents in HR, IT helpdesk, and healthcare domains
- Assessed customer satisfaction (CSAT) for bot-human interactions on a 5-point scale, identifying optimization opportunities in issue resolution efficiency and agent empathy
- Designed discourse coherence frameworks for maintaining context across dialog turns, improving semantic entailment and pragmatic intent understanding
Knowledge Management Systems
- Built similarity-based KB recommendation systems comparing AI Search vs. predictive models for ITSM first-call resolution improvement
- Identified and labeled KB-able cases from resolved incidents, creating golden knowledge articles for automated KB generation
- Validated KB article quality through hallucination and completeness metrics, reducing incident investigation time by 40%
Cross-functional Collaboration & Innovation
- Collaborated with engineering and product teams to refine ServiceNow's auto-evaluation models, identifying and resolving false negative bugs in judge prompts
- Served as linguistic consultant for NLU/NLG method implementation, validating model performance through systematic data analysis
- Coordinated with compliance teams to ensure AI outputs met regulatory standards and audit requirements
- Pioneered automated evaluation metrics using faithfulness, correctness, and linguistic precision scoring for scalable quality assessment
- Created technical documentation bridging implementation with business requirements, authored client-facing guidelines for auto-recast models and AI evaluation tools