I am currently working at a startup specializing in Computer Vision using Deep Learning.
My contributions include:
1. Deep Learning Model Training and Optimization: Led the training and fine-tuning of advanced deep learning models to significantly enhance capabilities in image selection, image editing, and image sharing, resulting in marked improvements in model accuracy and overall application functionality
2. Backend Development and Architecture: Spearheaded the design and implementation of the Python Backend Desktop Applications for Image Culling, Editing and Share, from inception, ensuring robust performance, scalability, and maintainability across three OS namely as Windows, Silicon Mac and Intel Mac. Desktop Applications available at https://algomage.com/downloads.
This involved:
One Shot Video Object Segmentation (OSVOS)
Implemented a Video Object Segmentation Model, OSVOS, using Pytorch library in Python. The model takes the first frame as a reference to the object for which a segmentation mask has to be generated in all the succeeding frames. The challenges are that the object of interest might undergo deformation, occlusion, or it could be changed in appearance, etc.,
https://github.com/nitishsaDire/osvos
Visual Object Tracking (VOT)
This project aims to track an object marked using a bounding box in the first frame throughout the video. A Siamese-based architecture is used that finds the patch in the input frame which is most similar to the template description of the object of interest in the first frame. Once a similar patch has been found it is passed to a Region Proposal Network.
https://github.com/nitishsaDire/VOT SIAMRPN
Image Segmentation
In this project generates an object mask for the image with categories of objects defined. Image Segmentation is basically assigning a label to each pixel of the input image. A UNET architecture with ResNet-34 backbone is used for implementation.
https://github.com/nitishsaDire/imagesegmentation
Video Classification
This project classifies a video in some predefined categories. ResNet is used for feature extraction of a frame and LSTM is used to use temporal information among the frames. Datasets used are UCF-50 and UCF-11.
https://github.com/nitishsaDire/videoClassification
M.Tech project
I have worked on a project of blockchain. Blockchain mainly solved the problem of building consensus among groups of agents without any central authority, so the underlying network is peer-to-peer. The work is mainly on ethereum blockchain, with the implementation of smart contracts for multi-armed bandit problems and their deployment on the blockchain.
I hereby declare that the information written above is correct to the best of my knowledge.