

Applied AI researcher and technical leader with 10+ years of experience developing advanced AI systems across computer vision, multimodal AI, generative models, and LLM-based architectures. My work focuses on exploring research ideas through rapid experimentation and translating promising approaches into reliable real-world AI systems operating at production scale.
I have led the design of multimodal AI platforms, generative vision systems, conversational AI architectures, and real-time perception pipelines, working at the intersection of vision, language, and generative AI. My work frequently involves investigating new model architectures, designing scalable inference systems, and bridging the gap between experimental research and deployable AI systems.
Inventor on multiple patents related to computer vision, mixed reality rendering, and spatio-temporal video understanding.
Led the design and deployment of multimodal AI systems spanning computer vision, generative AI, and LLM-based architectures for mixed-reality and real-time environments.
• Led a 10-person applied AI team, driving projects from research prototypes to production AI systems.
• Architected a multimodal conversational avatar platform combining ASR, FAISS retrieval, LLM reasoning, TTS, and lip-sync video synthesis.
• Designed and implemented RAG infrastructure including document chunking strategies, vector indexing, retrieval pipelines, and reranking mechanisms.
• Built a generative Virtual Try-On system combining DensePose, segmentation, garment parsing, and diffusion synthesis.
• Developed ultra-lightweight neural architectures (~20K parameter nano-UNets) enabling real-time inference in mobile browsers using ONNX Runtime and WebGPU.
• Led development of real-time poster/TV tracking pipelines, achieving ~50 FPS tracking inside browser environments.
• Designed GPU inference systems across H100, L40, and T4 environments, implementing efficient batching, AMP, and memory-safe inference pipelines.
• Established architecture patterns for edge vs server AI inference, balancing latency, compute cost, and model performance.
Senior individual contributor driving R&D in generative AI, talking-head synthesis, and multimodal video generation systems.
• Led R&D on lip-sync and talking-head generation models, exploring GAN/VAE, and 3D-aware architectures.
• Investigated diffusion-based generative pipelines for voice cloning, image generation, and video synthesis.
• Developed visual speech recognition (lip-reading) pipelines, reaching competitive word error rates (WER) against baseline systems.
• Built video enhancement pipelines using GFPGAN/GPEN-class models to improve the perceptual quality of generated content.
• Designed multilingual TTS and voice cloning pipelines for single-speaker and multi-speaker scenarios.
• Contributed across the company’s generative roadmap, including Text→Speech, Image→Video, and Digital Avatar systems.
Led machine learning strategy and deployment across industrial AI, and IoT analytics systems.
• Led data science strategy across 10+ ML projects, translating business goals into deployable ML systems.
• Developed reinforcement learning control systems (DQN, PPO, SAC) for industrial process optimization.
• Built time-series forecasting models for equipment monitoring and predictive maintenance.
• Designed production ML pipelines across AWS, Azure, and GCP, integrating IoT sensor data.
Delivered industrial deployments, including HVAC optimization (~52% power reduction), and ammonia recovery systems.
Generative Models: LDM, DDPM, DDIM, GAN (StyleGAN, Pix2Pix), VAE, ControlNet-style conditioning, classifier-free guidance, inpainting pipelines
Computer Vision: UNet / nano-UNet, CBAM attention, DensePose, Mask R-CNN, YOLO, optical flow (RAFT, PWCNet), image registration, FPN, ViT, hybrid CNN-transformer
Multimodal & Generative Video: Talking-head synthesis, lip-sync alignment, visual speech recognition, Image-to-Video generation, audio-driven animation, GFPGAN/GPEN video enhancement
LLMs & RAG: Prompt engineering, chain-of-thought, RAG pipeline design, FAISS, BM25 hybrid retrieval, reranking, query routing, LangChain, LlamaIndex, agent architectures
Speech & Audio: Streaming ASR integration, multilingual TTS, voice cloning (single & multi-speaker), neural vocoder integration
Reinforcement Learning: DQN, PPO, SAC — continuous control, industrial optimization, reward shaping, safe deployment under distributional shift
Efficient & Edge ML: Sub-25K parameter design, knowledge distillation, ONNX quantization, INT8/FP16 inference, TensorRT, WebGPU, ONNX Runtime, mobile inference
GPU & Inference Infrastructure: CUDA, AMP mixed precision, dynamic batching, multi-worker isolation, TensorRT, H100/L40/T4 optimization, Kubernetes, Ray
Frameworks & Tools: PyTorch, TensorFlow, HuggingFace Transformers/Diffusers, OpenCV, FAISS, LangChain, LlamaIndex, ONNX Runtime, TensorRT, Ray, FastAPI, Docker
5 Internal Patent Applications | Flam (Mixed Reality & Vision) · Vectorly (Video Processing & CV)
Efficient Neural Inference for Browser-Based Vision (Flam, 2024): Designed sub-25K parameter nano-UNet architectures with CBAM attention for real-time vision inference inside mobile browsers via WebGPU and ONNX Runtime.
Multimodal Conversational Avatar Architecture (Flam, 2024): Designed end-to-end pipeline integrating streaming ASR, FAISS retrieval, LLM reasoning, TTS, and lip-sync video synthesis for enterprise-grounded digital avatars.
Generative Virtual Try-On via Diffusion Pipeline (Flam, 2024): Built multi-stage clothing transfer system combining DensePose, human parsing, garment segmentation, and latent diffusion synthesis from single images.
Spatio-Temporal Video Segmentation via 3D Region Growing (Vectorly, 2020 — Patented): Developed novel segmentation method combining image registration and temporal region propagation for consistent object boundary tracking across video frames.
RAG Pipeline Design for Enterprise Documents (Flam, 2024): Investigated chunking strategies, vector indexing, and reranking architectures for retrieval-augmented generation over heterogeneous enterprise knowledge bases.
Talking-Head & Lip-Sync Generation R&D (Unscript.ai, 2022–2024): Systematically explored GAN/VAE, 3D-aware, and diffusion-based architectures for neural lip-sync and talking-head video generation.
Reinforcement Learning for Industrial Process Control (Bert Labs, 2021–2022): Applied DQN, PPO, and SAC to continuous industrial optimization under sensor noise, partial observability, and non-stationary real-world dynamics.
Generative Models: LDM, DDPM, DDIM, GAN (StyleGAN, Pix2Pix), VAE, ControlNet-style conditioning, classifier-free guidance, inpainting pipelines
Computer Vision: UNet / nano-UNet, CBAM attention, DensePose, Mask R-CNN, YOLO, optical flow (RAFT, PWCNet), image registration, FPN, ViT, hybrid CNN-transformer
Multimodal & Generative Video: Talking-head synthesis, lip-sync alignment, visual speech recognition, Image-to-Video generation, audio-driven animation, GFPGAN/GPEN video enhancement
LLMs & RAG: Prompt engineering, chain-of-thought, RAG pipeline design, FAISS, BM25 hybrid retrieval, reranking, query routing, LangChain, LlamaIndex, agent architectures
Speech & Audio: Streaming ASR integration, multilingual TTS, voice cloning (single & multi-speaker), neural vocoder integration
Reinforcement Learning: DQN, PPO, SAC — continuous control, industrial optimization, reward shaping, safe deployment under distributional shift
Efficient & Edge ML: Sub-25K parameter design, knowledge distillation, ONNX quantization, INT8/FP16 inference, TensorRT, WebGPU, ONNX Runtime, mobile inference
GPU & Inference Infrastructure: CUDA, AMP mixed precision, dynamic batching, multi-worker isolation, TensorRT, H100/L40/T4 optimization, Kubernetes, Ray
Frameworks & Tools: PyTorch, TensorFlow, HuggingFace Transformers/Diffusers, OpenCV, FAISS, LangChain, LlamaIndex, ONNX Runtime, TensorRT, Ray, FastAPI, Docker
5 Internal Patent Applications | Flam (Mixed Reality & Vision) · Vectorly (Video Processing & CV)