Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Timeline
Generic

Arvind Devaraj

Bangalore

Summary

Experienced Data Scientist with 10+ years in machine learning, NLP, and data-driven product development. Skilled in building scalable ML systems, designing and optimizing end-to-end data science workflows—from data ingestion and preprocessing to model deployment and monitoring. Currently leading Generative AI initiatives focused on applying large language models (LLMs) to real-world business problems. Strong academic foundation with a Master’s in Computer Science from IISc Bangalore (AIR 7, GATE CS). Past experience includes GPU programming at NVIDIA and deploying ML models in production at Embibe.

Overview

18
18
years of professional experience
2007
2007
years of post-secondary education

Work History

Machine Learning & Generative AI Consultant

RPALabs.com
12.2020 - Current
  • Spearheaded Generative AI initiatives for enterprise document understanding using large language models (LLMs) and vector search (Qdrant).
  • Developed NLP pipelines for entity recognition and keyword extraction using Transformer models.
  • Built customer-facing chatbots using RASA and explored LangChain-based retrieval-augmented generation (RAG) pipelines.
  • Experimented with embedding models and prompt engineering to optimize document-based QA systems.

Senior Data Scientist – NLP & ML

Embibe.com
06.2018 - 08.2020
  • Designed and deployed Transformer-based NER models for educational content extraction.
  • Built semantic similarity and content deduplication tools using Sentence-BERT.
  • Led the full ML lifecycle: data cleaning, feature engineering, training, validation, and production deployment.
  • Mentored a 5-member data science team and contributed to scalable architecture for NLP pipelines.

Data Science Consultant – Predictive Analytics

05.2016 - 06.2018
  • Delivered forecasting models and dashboards for retail clients, including Firstcry.com.
  • Built time-series models and recommendation systems that replaced costly third-party solutions.
  • Designed data products using Python, Pandas, and open-source ML frameworks.

Lead Engineer – Analytics & Infrastructure

Limitless
06.2013 - 05.2016
  • Developed scalable Python-based analytics systems supporting over 100K users.
  • Led backend and data pipeline development; implemented early data lake and batch-processing systems.
  • Contributed to product strategy by aligning data science outputs with user engagement goals.

Senior Software Engineer – GPU & HPC

NVIDIA
06.2011 - 05.2013
  • Optimized GPU kernels for real-time image processing and computer vision applications.
  • Gained hands-on experience in CUDA, parallelization, and applied linear algebra.
  • Contributed to the SDK used in NVIDIA’s imaging solutions.

Software Engineer – High Performance Computing

Synfora (Synopsys), Symantec, TRDDC, Allgo Systems
06.2007 - 05.2011
  • Worked on research and production-grade tools for performance optimization and embedded systems.
  • Developed tools for parallel code generation, profiling, and compiler-level optimizations.

Education

Master of Science - Computer Science

Indian Institute of Science (IISc)

B.Tech - Information Technology

University of Madras

Skills

  • Machine Learning
  • Supervised learning
  • Unsupervised learning
  • Model creation
  • Model evaluation
  • Model deployment
  • Deep Learning
  • Neural networks
  • Transformers
  • PyTorch
  • Probability
  • Statistics
  • Distributions
  • Bayesian methods
  • Database Systems
  • Relational databases
  • SQL
  • NoSQL databases
  • Query optimization
  • Data Warehousing
  • OLAP
  • ETL pipelines
  • Dimensional modeling
  • Systems Programming
  • Low-level programming
  • Memory management
  • Process management
  • Concurrency

Accomplishments

  • All India Rank 7 in GATE Computer Science
  • Tata Research Grant recipient for compiler and program optimization research
  • Invited speaker at DroidCon, HasGeek, and leading institutions (IISc, BITS, VIT) on NLP and AI topics

Timeline

Machine Learning & Generative AI Consultant

RPALabs.com
12.2020 - Current

Senior Data Scientist – NLP & ML

Embibe.com
06.2018 - 08.2020

Data Science Consultant – Predictive Analytics

05.2016 - 06.2018

Lead Engineer – Analytics & Infrastructure

Limitless
06.2013 - 05.2016

Senior Software Engineer – GPU & HPC

NVIDIA
06.2011 - 05.2013

Software Engineer – High Performance Computing

Synfora (Synopsys), Symantec, TRDDC, Allgo Systems
06.2007 - 05.2011

B.Tech - Information Technology

University of Madras

Master of Science - Computer Science

Indian Institute of Science (IISc)
Arvind Devaraj