

GenAI Platform Engineer with 6+ years of experience designing and operating enterprise-scale Retrieval-Augmented Generation (RAG) systems, distributed ingestion pipelines, and LLM orchestration frameworks. Strong expertise in Python backend development, extensible platform architectures, and multi-cloud LLM integrations supporting global enterprise clients.
RAG Platform & Core APIs
• Designed and built a production-grade Retrieval-Augmented Generation (RAG) platform powering Search, Chat Completion, and Search & Generate APIs for global enterprise clients.
• Implemented a hybrid semantic retrieval pipeline using Redis Vector Store and RediSearch (KNN similarity, keyword filtering, metadata constraints, and file-level scoring), improving search relevance by ~36% while achieving sub-100ms query latency at scale.
Chat Completion & LLM-Orchestrated Generation
• Engineered a multi-turn Chat Completion system with encrypted, session-aware conversation history, dynamic context injection, and automated summarization, reducing token usage by 40–70% and significantly lowering LLM costs without accuracy degradation.
• Designed a Search & Generate workflow combining retrieved context with LLM-driven prompt orchestration to generate structured outputs (SOPs, test cases, impact assessments), reducing manual client effort by up to ~65%.
Multi-Source Enterprise Ingestion
• Built a scalable multi-source ingestion framework integrating ServiceNow, Confluence, and SharePoint, supporting delta-based synchronization, pagination, retries, and rate-limit handling for high-volume enterprise datasets. Designed deep ServiceNow ingestion pipelines extracting KB articles, attachments, work notes, and related metadata, enabling richer semantic search and cross-document reasoning.
Upload, Preprocessing & Embedding Pipelines
• Developed a high-throughput Upload API supporting diverse enterprise document formats (HTML, JSON, TXT, PDF, DOCX, PPTX, CSV, XLSX, ZIP), with checksum-based deduplication, metadata enrichment, and asynchronous preprocessing via Celery for document cleanup, table extraction, OCR, and semantic chunking.
• Architected a decoupled embedding pipeline using message queues (Redis → RabbitMQ) and dedicated embedding services, enabling parallel processing and independent scaling of ingestion and vectorization workloads.
Platform Extensibility & Plugin Architecture
• Designed a modular, plugin-based architecture using abstract base contracts and runtime module loading, enabling extensible authentication, LLM integration, credential management, and upload customization without core code changes.
• Abstracted LLM providers behind a unified interface supporting Azure OpenAI, Vertex AI, AWS Bedrock, and client-specific models, with configuration-driven authentication strategies (OAuth2, token-based, basic) for enterprise integrations.
Security, Governance & Access Control
• Integrated centralized IAM-based authentication issuing JWT access and refresh tokens, implemented fine-grained RBAC across APIs, features, and UI layers, and securely managed secrets via vault integration for LLM providers and external connectors.
Client Delivery & Enterprise Impact
• Partnered directly with global enterprise clients—including Vodafone, Novartis, ADNOC, ADECCO, Kemper, GSK plc, Moderna, Telia, Bath & Body Works (BBW), and TRP—to deliver client-specific customizations using plug-and-play plugins, accelerating onboarding while maintaining core platform stability.
Observability & Production Support
• Implemented comprehensive observability across ingestion, retrieval, and LLM orchestration layers using structured logging and request-level tracing, enabling efficient debugging of asynchronous pipelines and reducing mean time to resolution (MTTR) during production issue triage.
• Built Supervised Machine Learning models to predict payment delays and revenue leakage using historical transaction and customer payment data.
• Performed data preprocessing, feature engineering, and exploratory analysis on large-scale financial datasets from Warner Bros.
• Trained and evaluated models such as Logistic Regression, Random Forest, and Gradient Boosting to classify high-risk delayed payments.
• Improved prediction accuracy by identifying behavioral patterns (late payment frequency, amount variance, seasonality).
• Collaborated with finance stakeholders to translate model outputs into actionable risk flags and dashboards.
Python (Backend Development with Flask APIs)
Retrieval-Augmented Generation (RAG)
Embeddings & Vectorization
Chunking & Retrieval Pipelines
LLM Integration (OpenAI, AWS Bedrock, Google Vertex AI)
Redis Vector Store (Embeddings & Metadata)
MongoDB
Celery (Asynchronous Task Processing)
RabbitMQ (Message Queues)
JWT Authentication
Role-Based Access Control (RBAC)