Summary
Overview
Work History
Education
Timeline
Methods, Architectures & Technical Frameworks
Patents & Intellectual Property
Research Contributions
Methods, Architectures & Technical Frameworks
Patents & Intellectual Property
Generic
Avijit Kundal

Avijit Kundal

Applied AI Research Engineer | Multimodal AI Systems | Vision + Generative AI | LLM Platforms
Bengaluru

Summary

Applied AI researcher and technical leader with 10+ years of experience developing advanced AI systems across computer vision, multimodal AI, generative models, and LLM-based architectures. My work focuses on exploring research ideas through rapid experimentation and translating promising approaches into reliable real-world AI systems operating at production scale.

I have led the design of multimodal AI platforms, generative vision systems, conversational AI architectures, and real-time perception pipelines, working at the intersection of vision, language, and generative AI. My work frequently involves investigating new model architectures, designing scalable inference systems, and bridging the gap between experimental research and deployable AI systems.

Inventor on multiple patents related to computer vision, mixed reality rendering, and spatio-temporal video understanding.

Overview

10
10
years of professional experience

Work History

Senior AI Engineer & AI Tech Lead

Flam
08.2024 - Current

Led the design and deployment of multimodal AI systems spanning computer vision, generative AI, and LLM-based architectures for mixed-reality and real-time environments.

• Led a 10-person applied AI team, driving projects from research prototypes to production AI systems.
• Architected a multimodal conversational avatar platform combining ASR, FAISS retrieval, LLM reasoning, TTS, and lip-sync video synthesis.
• Designed and implemented RAG infrastructure including document chunking strategies, vector indexing, retrieval pipelines, and reranking mechanisms.
• Built a generative Virtual Try-On system combining DensePose, segmentation, garment parsing, and diffusion synthesis.
• Developed ultra-lightweight neural architectures (~20K parameter nano-UNets) enabling real-time inference in mobile browsers using ONNX Runtime and WebGPU.
• Led development of real-time poster/TV tracking pipelines, achieving ~50 FPS tracking inside browser environments.
• Designed GPU inference systems across H100, L40, and T4 environments, implementing efficient batching, AMP, and memory-safe inference pipelines.
• Established architecture patterns for edge vs server AI inference, balancing latency, compute cost, and model performance.

Deep Learning Scientist(Senior/Staff-level IC)

Unscript.ai
11.2022 - 08.2024

Senior individual contributor driving R&D in generative AI, talking-head synthesis, and multimodal video generation systems.

• Led R&D on lip-sync and talking-head generation models, exploring GAN/VAE, and 3D-aware architectures.
• Investigated diffusion-based generative pipelines for voice cloning, image generation, and video synthesis.
• Developed visual speech recognition (lip-reading) pipelines, reaching competitive word error rates (WER) against baseline systems.
• Built video enhancement pipelines using GFPGAN/GPEN-class models to improve the perceptual quality of generated content.
• Designed multilingual TTS and voice cloning pipelines for single-speaker and multi-speaker scenarios.
• Contributed across the company’s generative roadmap, including Text→Speech, Image→Video, and Digital Avatar systems.

Head of Data Science

Bert Labs Pvt Ltd
08.2021 - 10.2022

Led machine learning strategy and deployment across industrial AI, and IoT analytics systems.

• Led data science strategy across 10+ ML projects, translating business goals into deployable ML systems.
• Developed reinforcement learning control systems (DQN, PPO, SAC) for industrial process optimization.
• Built time-series forecasting models for equipment monitoring and predictive maintenance.
• Designed production ML pipelines across AWS, Azure, and GCP, integrating IoT sensor data.
Delivered industrial deployments, including HVAC optimization (~52% power reduction), and ammonia recovery systems.

AI Lead

Xplorazzi
10.2020 - 04.2021
  • Achieved a 70%+ accuracy in SKU detection for 50+ Reckitt & Benckiser products in low-quality store images through deep learning models.
  • Conducted artificial data generation for diverse SKUs using Blender 3D renders, enhancing model robustness.
  • Pioneered the design, development, and deployment of a Tooth detection and numbering system for panoramic and periapical X-ray images, utilizing a custom X-ray dental dataset.
  • Innovated and implemented a Foot size measurement app, calculating foot size from top and side perspective foot images, demonstrating a holistica approach to image-based applications.

Co-Founder & AI Systems Lead

Vectorly
05.2018 - 09.2020
  • Built and operated distributed GPU pipelines on Kubernetes (Ray), prioritizing reliability, throughput, and debuggability for large-scale CV workloads across local + cloud environments.
  • Co-founded / drove core platform R&D for patented video vectorization (pixel → vector) technology; owned key algorithmic and system decisions from prototype to production-ready pipeline.
  • Developed and patented a 3D region-growing segmentation approach across frames by combining image registration + temporal segmentation, improving consistency on animated/video content.
  • Implemented GPU-accelerated CV components and optimized critical paths for performance (profiling, batching, memory-aware execution) to meet production throughput targets.
  • Led research-to-product work to improve segmentation/detection on low-quality inputs, translating findings into pragmatic changes that improved real-world robustness.
  • Stayed close to SOTA and engineering trade-offs by continuously reviewing papers and rapidly validating what’s worth shipping vs. what’s fragile.

Social Media Data Analyst

D Techmonkey Sols. Pvt.Ltd
04.2016 - 04.2018
  • Implemented CNNs and NLP for content filtering on various Facebook pages, enhancing recommendation systems.
  • Time series forecasting to predict content views, and identify seasonal trends in reading preferences throughout the year.
  • Managed multiple Facebook pages, achieving a total of 400,000 followers within a span of less than two years.
  • Authored over 20 science and technology articles for the engaged followers.

Education

B.Tech , (program Discontinued 2013) - Textile Technology(Major)

Indian Institute of Technology
Delhi
04.2014

Integrated BSc–MSc in Data Science And Programming - Data Science And Programming

Indian Institute of Technology
Madras
05-2028

Level 6 Diploma in Economics - Economics

Manukau Institute of Technology
Auckland, New Zealand
09-2015

Timeline

Senior AI Engineer & AI Tech Lead

Flam
08.2024 - Current

Deep Learning Scientist(Senior/Staff-level IC)

Unscript.ai
11.2022 - 08.2024

Head of Data Science

Bert Labs Pvt Ltd
08.2021 - 10.2022

AI Lead

Xplorazzi
10.2020 - 04.2021

Co-Founder & AI Systems Lead

Vectorly
05.2018 - 09.2020

Social Media Data Analyst

D Techmonkey Sols. Pvt.Ltd
04.2016 - 04.2018

Integrated BSc–MSc in Data Science And Programming - Data Science And Programming

Indian Institute of Technology

Level 6 Diploma in Economics - Economics

Manukau Institute of Technology

B.Tech , (program Discontinued 2013) - Textile Technology(Major)

Indian Institute of Technology

Methods, Architectures & Technical Frameworks

Generative Models: LDM, DDPM, DDIM, GAN (StyleGAN, Pix2Pix), VAE, ControlNet-style conditioning, classifier-free guidance, inpainting pipelines

Computer Vision: UNet / nano-UNet, CBAM attention, DensePose, Mask R-CNN, YOLO, optical flow (RAFT, PWCNet), image registration, FPN, ViT, hybrid CNN-transformer

Multimodal & Generative Video: Talking-head synthesis, lip-sync alignment, visual speech recognition, Image-to-Video generation, audio-driven animation, GFPGAN/GPEN video enhancement

LLMs & RAG: Prompt engineering, chain-of-thought, RAG pipeline design, FAISS, BM25 hybrid retrieval, reranking, query routing, LangChain, LlamaIndex, agent architectures

Speech & Audio: Streaming ASR integration, multilingual TTS, voice cloning (single & multi-speaker), neural vocoder integration

Reinforcement Learning: DQN, PPO, SAC — continuous control, industrial optimization, reward shaping, safe deployment under distributional shift

Efficient & Edge ML: Sub-25K parameter design, knowledge distillation, ONNX quantization, INT8/FP16 inference, TensorRT, WebGPU, ONNX Runtime, mobile inference

GPU & Inference Infrastructure: CUDA, AMP mixed precision, dynamic batching, multi-worker isolation, TensorRT, H100/L40/T4 optimization, Kubernetes, Ray

Frameworks & Tools: PyTorch, TensorFlow, HuggingFace Transformers/Diffusers, OpenCV, FAISS, LangChain, LlamaIndex, ONNX Runtime, TensorRT, Ray, FastAPI, Docker

Patents & Intellectual Property

5 Internal Patent Applications | Flam (Mixed Reality & Vision) · Vectorly (Video Processing & CV)

  • Real-Time Harmonization of Digital Objects for 3D Mixed Reality Environments (Lead Inventor, Flam)
  • Spatio-Temporal Video Segmentation using 3D Region Growing Across Frames (Lead Inventor, Vectorly)
  • Context-Aware Object Placement in Mixed Reality Systems (Co-Inventor, Flam)
  • Shadow Detection and Scene Classification via Light-Source Identification (Co-Inventor, Flam)
  • Video-to-Vector Representation for Scalable Animation and Editing (Co-Inventor, Vectorly)

Research Contributions

Efficient Neural Inference for Browser-Based Vision (Flam, 2024): Designed sub-25K parameter nano-UNet architectures with CBAM attention for real-time vision inference inside mobile browsers via WebGPU and ONNX Runtime.

  • Achieved ~50 FPS tracking performance entirely client-side — no server round-trip
  • Demonstrated viable segmentation at ~20K parameters vs. standard UNets at 1M+

Multimodal Conversational Avatar Architecture (Flam, 2024): Designed end-to-end pipeline integrating streaming ASR, FAISS retrieval, LLM reasoning, TTS, and lip-sync video synthesis for enterprise-grounded digital avatars.

  • Addressed latency budget allocation across 5 modalities in a single real-time pipeline
  • Retrieval system handled heterogeneous enterprise document corpora with multi-stage reranking

Generative Virtual Try-On via Diffusion Pipeline (Flam, 2024): Built multi-stage clothing transfer system combining DensePose, human parsing, garment segmentation, and latent diffusion synthesis from single images.

  • Maintained garment texture fidelity across diverse poses and lighting conditions
  • Deployed as modular GPU inference services across containerized production environments

Spatio-Temporal Video Segmentation via 3D Region Growing (Vectorly, 2020 — Patented): Developed novel segmentation method combining image registration and temporal region propagation for consistent object boundary tracking across video frames.

  • Reduced boundary drift on animated and live-action video vs. frame-by-frame approaches
  • Formalized as granted internal patent; deployed in production video vectorization pipeline

RAG Pipeline Design for Enterprise Documents (Flam, 2024): Investigated chunking strategies, vector indexing, and reranking architectures for retrieval-augmented generation over heterogeneous enterprise knowledge bases.

  • Designed incremental index update system for live document corpora
  • Built query-routing logic spanning structured, semi-structured, and unstructured sources

Talking-Head & Lip-Sync Generation R&D (Unscript.ai, 2022–2024): Systematically explored GAN/VAE, 3D-aware, and diffusion-based architectures for neural lip-sync and talking-head video generation.

  • Benchmarked visual speech recognition (VSR) pipeline against baseline WER metrics
  • Applied GFPGAN/GPEN-class enhancement to reduce perceptual artifacts in generated video

Reinforcement Learning for Industrial Process Control (Bert Labs, 2021–2022): Applied DQN, PPO, and SAC to continuous industrial optimization under sensor noise, partial observability, and non-stationary real-world dynamics.

  • HVAC control deployment at Unilever Mumbai → ~52% reduction in power consumption
  • Ammonia recovery process automation deployed at GHCL in live industrial environment

Methods, Architectures & Technical Frameworks

Generative Models: LDM, DDPM, DDIM, GAN (StyleGAN, Pix2Pix), VAE, ControlNet-style conditioning, classifier-free guidance, inpainting pipelines

Computer Vision: UNet / nano-UNet, CBAM attention, DensePose, Mask R-CNN, YOLO, optical flow (RAFT, PWCNet), image registration, FPN, ViT, hybrid CNN-transformer

Multimodal & Generative Video: Talking-head synthesis, lip-sync alignment, visual speech recognition, Image-to-Video generation, audio-driven animation, GFPGAN/GPEN video enhancement

LLMs & RAG: Prompt engineering, chain-of-thought, RAG pipeline design, FAISS, BM25 hybrid retrieval, reranking, query routing, LangChain, LlamaIndex, agent architectures

Speech & Audio: Streaming ASR integration, multilingual TTS, voice cloning (single & multi-speaker), neural vocoder integration

Reinforcement Learning: DQN, PPO, SAC — continuous control, industrial optimization, reward shaping, safe deployment under distributional shift

Efficient & Edge ML: Sub-25K parameter design, knowledge distillation, ONNX quantization, INT8/FP16 inference, TensorRT, WebGPU, ONNX Runtime, mobile inference

GPU & Inference Infrastructure: CUDA, AMP mixed precision, dynamic batching, multi-worker isolation, TensorRT, H100/L40/T4 optimization, Kubernetes, Ray

Frameworks & Tools: PyTorch, TensorFlow, HuggingFace Transformers/Diffusers, OpenCV, FAISS, LangChain, LlamaIndex, ONNX Runtime, TensorRT, Ray, FastAPI, Docker

Patents & Intellectual Property

5 Internal Patent Applications | Flam (Mixed Reality & Vision) · Vectorly (Video Processing & CV)

  • Real-Time Harmonization of Digital Objects for 3D Mixed Reality Environments (Lead Inventor, Flam)
  • Spatio-Temporal Video Segmentation using 3D Region Growing Across Frames (Lead Inventor, Vectorly)
  • Context-Aware Object Placement in Mixed Reality Systems (Co-Inventor, Flam)
  • Shadow Detection and Scene Classification via Light-Source Identification (Co-Inventor, Flam)
  • Video-to-Vector Representation for Scalable Animation and Editing (Co-Inventor, Vectorly)
Avijit KundalApplied AI Research Engineer | Multimodal AI Systems | Vision + Generative AI | LLM Platforms