End-to-End NL to SQL and Deep Analysis Pipeline for Business Intelligence (BI) teams
• Objective:
Simplify database interactions for CXO-level reporting and decision-making by enabling natural language querying of structured data.
• Leadership:
Led a cross-functional team of Data Scientists and ML Engineers to deliver a scalable, high-performance system.
• Key Contributions:
• Multi-layered NL to SQL Conversion System:
• Implemented a query decomposition module to break down complex natural language questions into sub-queries for parallel and accurate processing.
• Built a query classification mechanism using a fine-tuned BERT model to route simple vs. complex queries along optimized paths.
• Enhanced Named Entity Recognition (NER) with fine-tuned BERT for precise entity detection, enabling granular Redis-based searches.
• Performance Optimization:
• Integrated a Redis caching layer to accelerate query response times and reduce database access costs.
• System Architecture:
• Delivered a microservices-based architecture that maintains session context for multi-turn interactions and follow-ups.
• Deployed services using FastAPI, Redis, and RabbitMQ, containerized with Docker, and hosted on VM infrastructure.
• Outcome:
Streamlined BI workflows with a robust, scalable, and fault-tolerant NL to SQL system that significantly improved access to structured data and enabled faster CXO-level insights.
LLM-Powered Retrieval-Augmented Generation (RAG) System for Logistics, HR and Supply
• Objective:
Enhance unstructured document processing and retrieval for operational teams by leveraging advanced ML and LLM techniques.
• Leadership:
Led a team of Data Scientists and ML Engineers in the design, development, and deployment of the system.
• Key Contributions:
• Document Classification & Parsing:
• Developed a YOLO-based document classifier to identify and separate simple vs. complex pages.
• Applied regular parsers for simple pages; leveraged GPT-4 for complex table and chart extraction.
• Integrated LayoutLM for accurate table extraction in structured document layouts.
• Data Storage & Hybrid Retrieval:
• Generated vector embeddings from documents and indexed them in Redis Vector DB (RedisVL) for hybrid semantic search.
• Employed ColBERT for re-ranking results using fine-grained distance-based scoring.
• Query Handling & User Interaction:
• Enabled NER-driven search refinement and GPT-based summarization chains for in-depth query responses.
• Maintained session-based context to support multi-turn queries and dynamic follow-ups.
• Scalability & Feedback Loop:
• Used the Ray framework for parallelized embedding generation.
• Integrated a real-time feedback loop for model fine-tuning and iterative performance enhancement.
• System Deployment:
• Delivered as a microservice-based architecture deployed via FastAPI, Redis, and RabbitMQ, containerized using Docker and hosted on VMs.
• Outcome:
Delivered a robust, efficient, and user-friendly system that significantly improved data access, operational decision-making, and productivity across Logistics and HR domains.
Agentic Framework for Multi-Tool Orchestration using LangGraph
• Objective:
Enable dynamic, context-aware query resolution and multi-hop task execution using an agentic architecture across multiple tools and data modalities.
• Key Contributions:
• Developed an agentic framework with LangGraph to support:
• Multi-hop query resolution via dynamic state transitions.
• Adaptive workflows for query refinement, sub-task chaining, and contextual task switching across API-bound tools.
• Implemented Agentic RAG (Retrieval-Augmented Generation) with automatic query regeneration for ambiguous or under-specified queries.
• Built a Corrective RAG pipeline to improve retrieval precision and employ external tools (e.g., calculators, search APIs, summarizers) for aided answering.
• Leveraged MCP (Model Context Protocol) to streamline and standardize structured tool invocation across agents.
• Introduced an Agentic Supervisor layer to manage complex workflows involving both structured (SQL, APIs) and unstructured (PDFs, text) sources, enabling unified handling across modalities.
• Ensured session-based state management for real-time, contextual continuity and seamless user interactions.
• Outcome:
Delivered a powerful, extensible agentic system capable of intelligent tool orchestration, contextual multi-step reasoning, and adaptive problem-solving across both structured and unstructured data domains.
Deployment & Infrastructure Engineering
• Objective:
Deliver scalable, modular, and production-ready ML and application services through standardized, efficient deployment practices.
• Key Contributions:
• Modular Service Architecture:
• Architected each major module as an independent framework with sub-services exposed via FastAPI-based APIs, ensuring clean separation of concerns and reusability across projects.
• VM-Based Deployment & Docker Compose:
• Conducted isolated service testing and end-to-end integration in VM environments using Docker Compose, enabling reproducible local and staging deployments.
• Load Handling via Queueing Mechanisms:
• Integrated RabbitMQ and Celery to decouple components and efficiently handle high-throughput workloads across multiple services.
• Scaled Orchestration with Kubernetes:
• Deployed containerized services in Kubernetes clusters for horizontal scaling, high availability, and fault-tolerant service orchestration with environment-specific Helm charts and autoscaling policies.
• CI/CD with GitHub Actions:
• Managed deployment and release pipelines using GitHub Actions for automated testing, container builds, image publishing, and versioned deployments to staging and production environments.
Invoice Duplicate Detection Solution Development
Objective:
• Led a high-performing team to develop a robust solution for detecting duplicate invoices, reducing financial risks, and enhancing operational efficiency.
Key Contributions:
• Semantic/Text-Matching Engine Development:
• Built an advanced semantic/text-matching engine to identify duplicate invoices, improving fraud prevention and financial integrity.
• Integrated a machine learning layer to enhance predictive accuracy and reduce false positives, streamlining invoice processing.
• Solution Architecture and Development:
• Designed a scalable solution on Azure ML Studio, adhering to best coding and architectural practices.
• Implemented DVC for data versioning, MLFlow for model lifecycle management, and Deepchecks/EvidentlyAI for data quality assessments.
• Enhanced model interpretability using the Shapely framework.
• CI/CD Pipeline and Performance Monitoring:
• Architected CI/CD pipelines on Azure DevOps/GitHub Actions for automated Azure ML pipeline deployment, accelerating delivery.
• Instituted Azure Application Insights for performance monitoring, enabling proactive bottleneck resolution.
Outcome:
• Successfully delivered a solution that improved invoice detection and fraud prevention, leading to reduced financial risks and enhanced operational efficiency.
Forecasting Framework for Accurate Predictions
Objective:
• Designed a versatile forecasting framework integrating models like ARIMA, Prophet, and LSTM for accurate predictions.
Key Components:
• Anomaly Detection & Correction:
• Developed an anomaly detection and correction framework using Prophet + ADTK and regularized ARIMA, improving forecast reliability.
• Scalable Forecasting with Parallel Computing:
• Implemented parallelized and vectorized computing on Databricks clusters for scalable, high-performance forecasting.
• Temporal Regression Framework:
• Architected a temporal regression framework leveraging Light GBM and XGBoost, addressing large-scale forecasting challenges with intelligent grouping and parallelized training.
• Automated Feature Extraction:
• Automated feature extraction pipelines, optimizing precision for product clusters.
• Platform Flexibility:
• Designed the solution for seamless operation on Azure ML Studio or Databricks, providing platform flexibility.
Outcome:
• Enabled precise, high-performance forecasting, enhancing decision-making across the organization.
Additional Contributions & Tools:
• Managed development and deployment workflows via Azure DevOps, ensuring efficiency and scalability.
• Delivered multiple POC projects showcasing innovative solutions.
• Proficient in Power BI and Tableau for impactful data visualizations.
• Created an open-source framework (Docker, GitHub Actions, MLFlow, PostgreSQL) to mimic Azure ML ecosystems for on-premise clients.
• Developed FastAPI templates for quick plug-and-play analytics solutions.
I am passionate about history, constantly exploring different eras and historical events. I enjoy researching and learning about the past to gain insights into the present