Museum Q&A system: Developed a multimodal model Q&A system tailored for MET museum, accepting image and regional language questions as input & answering in the corresponding language. Collected image & text data, Implemented a language translation pipeline. Finetuned a Vision Language Model on the data collected., Python Image Captioning, Designed & developed an
image-captioning application using pre-trained CNN and LSTM models, based on the Flickr dataset (8K). Studied models such as CLIP, GPT2, GPTJ, GPT3., Python, Keras, TensorFlow
Crowd Counting, Implemented crowd counting algorithm from the paper CSRNet (CVPR 2018). Surveyed literature on different crowd counting models., Keras, Tensorflow, Scipy, Numpy, Pillow, OpenCV
License Plate Recognition, Implemented traditional Computer Vision on 1000 Indian number plates images to efficiently retrieve & store data, handling real-world conditions - noise, low illumination, non-standard fonts., Python, OpenCV, Numpy, Pillow, Pytesseract