Development of Open-sourced image, video, and audio processing library - rocAL
- Developed and optimized a high-performance, open-source image, video, and audio processing library, enhancing data augmentation capabilities through custom kernel development using intrinsics and HIP/CUDA
- Designed and implemented distributed processing solutions utilizing multi-GPUs and multi-threading, significantly improving performance for large-scale multimedia datasets
- Enhanced the accuracy and performance of machine learning models (SSD, Image Classification, RNNT, OpenClip) by optimizing data pre-processing and integrating the rocAL data-loader, resulting in measurable improvements in training efficiency and model accuracy
- Implemented robust metadata handling for labels, bounding boxes, and masks, enabling comprehensive data analysis and model training through proficient parsing of JSON and XML files
- Integrated diverse data frameworks (Coco, ImageNet, TF records, Caffe, Caffe2 records, WebDataset, Audio), developing custom solutions to ensure seamless data flow and compatibility
- Engineered a Python API using Pybind, facilitating efficient data transfer and integration between C++ and Python components
- Developed and maintained comprehensive Python unit tests, ensuring code quality and reliability throughout the development lifecycle
- Proficiently utilized debugging tools (gdb, valgrind) to identify and resolve critical issues such as segmentation faults and memory corruption
- Implemented Docker containerization for streamlined development and deployment processes
- Created and deployed custom Python packages (.egg, .wheel), enhancing library distribution and usability
Conducted a proof-of-concept for automatic radiology report generation using LLMs & VLMs
- Worked with models - vicuna, llava-1.5-7b-hf
- Optimizing model performance through TensorRT conversion and int8/int4 quantization, and validated results with chest X-ray images