AI is transforming the very way we humans live, in ways never imagined possible, and it will continue to do so. It is my pleasure to use AI towards wellbeing and comfort and it's fun too.
Overview
6
6
years of professional experience
5
5
Years of professional experience in Deep Learning
Work History
Senior Computer Vision Engineer
Ixana
03.2021 - Current
Realtime Barcode and QR code detection
a. This was a use case, where the model needs to be deployed on mobile and will be used at a warehouse.
b. So first I did a literature study and finalized an model architecture and training technique keeping edge deployemnet in my mind.
c. I made the architecture shallower by removing layers carefully to retrain performance but reduce computation, so I made the model 30% more efficient yet retaining same Av. precision by through hyperparameter tunning.
d. First trained the model with available open source data and then collected some data from the internet and annotated them and on training with public data an increase in performance was observed and a POC was made, which can detect barcodes with 96% accuracy from the input 250ms long video.
e. A demo application was made using Android studio and esp32 IP cam and the model was deployed in it using tflite.
f. With the help of it our team managed to collect some real data.
g. Then we annotated the dataset and trained our model on it to make it better. (99.6% accuracy on 250ms video)
h. The model was successfully able to detect and localize barcodes from the environment.
i. A series of augmentation techniques, different sets of hyperparameters and multiple training strtegies were used to make the model deployment ready.
j. Time profiling was done using tflite and GPU delegate and it was found that even the unquantized model was satisfying the metrics. The model's execution time was under 90ms.
k. I used accuracy as the metric. (over a 250ms video) and AP to evaluate the model on single image
Warehouse object detection with one-shot classification (POC) a. This project required detecting objects during the process of checkout in a warehouse.
b. Since there is no limit on the type of objects that can be seen in a warehouse, it was required to build a 1 shot classification model.
c. I used VGG as a backbone network to create a 1 shot classifier
d. And trained it with triplet loss function with cosine similarity.
e. The app to support the model is still in the development stage.
f. Used accuracy as the metric. (over a 250ms video)
g. The model's execution time was 180ms, which was acceptable for the POC.
f. The model's accuracy was 88%, which will increase with availability of more data.
Hand tracking with key points
a. For AR use-cases, it was needed to build a hand tracking system to understand human-computer interaction.
b. This problem was solved in the AI community, so I got good models with more than 96% detection accuracy.
c. So we build an API for our SDK user, and deployed the hand tracking system.
Application to determine how close people are to the camera and then recognize them a. This is another AR application where I needed to use Face detection, Face depth estimation, and Face recognition to solve the problem.
b. I have used facenet(acc. - 99%) based face detection system and VGGFACE(acc - 98.78%) for the recognition of faces.
c. The facenet model was also made to output 6 facial landmarks using with I have calculated as the depth of the face.
d. Then I created a deployement framework using Java and kotlin to use it in Android.
e. The app has all the features to tag new faces, store tagged faces in db, etc.
Model Deployment frameworks in tflite and snpe using android studio
a. I create and help my team create ML SDKs and APIs to be used by customers.
b. So following that I have built generic inference frameworks using tflite to load and infer any model.
c. Also integrated it with the larger image processing sdk.
d. I have also written codes to process streamed videos from our AR cam.
MLOps a. With the development of models and deployment I also design the MLOps best practices to navigate the team effortlessly.
b. I have put in use certain consistent naming conventions of projects, datasets, and models and maintained a central storage space to store them.
c. I have created training guidelines so that the training process can be faster
d. Also there are coding guidelines and indigenously build coding framework to help us design models faster
e. Finally we have a CI/CD framework, so that we can deploy our best models without changing and building the codebase
Video summarization
a. I'm currently working on this project and it's at a very early stage.
b. I'm using image captioning model and gpt to summarize what the user's head mount camera sees in its environment.
c. This solution is currently deployed on AWS using Nvidia gpus.
d. We have build an arrdoild app to go with the backend, but we have yet to define the scope of this project.
Product Engineer - AI and Deep Learning
Saket Mohanty
03.2019 - 03.2021
Realtime Edge Super Resolution to enhance OTT experience (Product) : a. I have worked on this project contributing from research to deployment and release. We deployed this model with Disny+hotstar.
b. I have conducted initial phase research and evaluated various contemporary SR models using deep learning and established myself as a subject matter expert in the domain within the company.
c. Using the ideas gained I have built the company's first in-house SR model using open source data as a POC. Using this model and the available dataset (DIV 2k) we achieved 32 PSNR.
d. Then using the demo we collected relevant data from OTT service providers and using that defined the scope of the model.
e. Also researched to find an appropriate metric to evaluate the model. Finalized with VMaF proposed by Netflix as the metric.
f. Then collected a good amount of video data across various genres and analyzed them thoroughly with the help of a domain expert in video encoding, decoding, and streaming.
g. Created a good model for production through multiple iterations of training and modified the architecture of the model by analyzing bias vs variance, error analysis, and also analyzing the effects of each layer in the network.
h. Me and my team ended up building multiple models to support low end (hence low powered) devices and high end (high power) devices. Qualifying VMaF for the high end models was 75, which was decided by the OTT service provider and our models were able to achieve those.
h. The solution was to be deployed on Qualcomm-powered mobile devices, so optimized the model to run on low-powered devices by using deformable convolutions to decrease the number of parameters, atrous conv to increase receptive field, and using substitute methods for complex layers like tanh.
i. Also conducted a lot of experiments, guided by the result of monitoring the model's perfomance, and came up with intuitions and directions toward creating better datasets.
j. Used Snapdragon Neural Processing Engine to port the model to mobile devices.
k. Worked with the Qualcomm Hyderabad team with multiple training and collaboration sessions to create a deployment framework for the model using C++.
l. Deployed the model on QWualcom's DSP with integer quantization to increase speed and decrease power consumption.
VFX Usecases (POC) :
1. Worked with an in-house VFX expert to understand the VFX workflow in the movie industry and analyze the effect of deep learning.
2. Proposed 2 POCs after thoroughly understanding the industry. i.e.
a. Rotoscoppy using AI
b. 2D to 3D conversion using AI
3. Accomplished both of them using the then SOTA and building demos with it potential to collect more data using it to build better models.
4. Rotoscopy was tackled using a combination of Sematic Segmentation and Image matting techniques. (evaluation metric was SAD = 45.8)
5. 2D to 3D conversion was done using Graph convolution technique to create 3D graph of the 2D image, this technique was similar to curren diffusion models.
6. I led a team of 4 to carry out this task.
Restoration (Service) :
1. While working in content restoration, the task was to restore old movies and tv series to current standards, so that OTT providers can put
them in their app.
2. I helped the team in researching and finding out the best restoration algorithms out there. Which includes,
a. Image Inpainting (FID = 22.3)
b. Image/Video denoising (PSNR = 38)
c. Image/Video Super Resolution (PSNR = 38.3)
d. Image/Video Colorisation
Low Light Enhancement :
1. The goal was to create a low light enhancement model using deep learning to be used in cars
2. I have conducted research and finalized the architecture, training method, and loss
3. Further I have evaluated the models on available data and modified them to make them run in real-time.
4. We ended up going with a unsupervised approach called ZeroDCE (PSNR = 18.7)
Assistant System Engineer
Tata Consultancy Services
02.2017 - 11.2018
Education
Bachelor's degree - Computer Science
IGIT
01.2012 - 04.2016
Accomplishments
Neural Network and Deep-learning - DeepLearning.AI8WZUN77YRU6U