Summary
Overview
Work History
Education
Skills
Certification
Work Availability
Timeline
Generic
Sanjay Kumar

Sanjay Kumar

Bhubaneswar

Summary

Seasoned Applied Data Scientist with 16 years of experience, including over 6 years of leading high-impact AI initiatives. Skilled in aligning data product strategies with business objectives, managing cross-functional teams, and driving enterprise-wide transformation through Generative AI. Demonstrates strong expertise in project execution, stakeholder engagement, and talent development, with a proven record of scaling data science capabilities across organizations.

Overview

18
18
years of professional experience
1
1
Certification

Work History

AI Analyst Consultant

Qualfon
New Delhi
12.2024 - 02.2025
  • Led the design and developed an AI-driven Training Recommendation System, leveraging the Llama 3.1 model for personalized course suggestions.
  • Orchestrated data pipelines with Azure Data Fabric to ingest data from SSMS and external application sources into the Medallion Bronze layer in Fabric Lakehouse.
  • EQ, application flow design, and collaborative work with the multi-functional team for a smooth design, development, and deployment.

Tech Stack: Python, SSMS, Llama 3.1, Azure Fabric, Azure AI Foundry, Azure AI Search, Fabric Data Factory, OneLake, and Model Serving as a REST endpoint in a WebApp.

Data Scientist

CSM Technology
Bhubaneswar
06.2021 - 05.2024

MedhaK:

Spearheaded the development of MedhaK, a generative AI-powered virtual assistant, fine-tuned on proprietary enterprise data to deliver contextual insights, and enhance employee decision-making efficiency across departments.

Tech Stack: Python, PuMuPDF, ChromaDB, RAG, Llama2, Prompt Technique, React, Ubuntu 22.04 LTS, Celery, REST API.

iLMS: Integrated Legal Monitoring System

  • Developed a generative AI solution for iLMS to analyze and summarize court case documents automatically.
  • The system extracts petitioner/opponent details, synopsis, subject matter, classifies petitioner type (Govt./Non-Govt.), and performs semantic tagging of similar cases.

Tech Stack: Python, LangChain, RAG, UniNER, FastAPI, Ollama, and Vector DBs for retrieval.

e-Despatch 3.0:

  • Spearheaded the upgrade of the e-Despatch application by integrating RPA to streamline the monitoring of official correspondence between the state headquarters and field offices.
  • The solution reduced delays, minimized communication gaps, and enhanced transparency in government communication workflows—all within existing Records Manual regulations.

Tech Stack: .NET Core, Python, Pytesseract, Poppler, PySimpleGUI, NLP, OpenCV, NER entity extraction, Selenium for RPA, MySQL.

Resume Parser & Ranker:

  • Developed an NLP-based system to automatically retrieve and parse resumes from a backend application, extracting named entities using a custom NER model deployed on AWS. Enabled streamlined candidate data extraction, supporting scalable and intelligent resume processing for HR analytics.

Tech Stack: EasyOCR, NLP, SpaCy NER, fuzzy string matching, LabelStudio, AWS Elastic Beanstalk, and Rest API.

Classification of Mineral Ore (Lumps and Fines):

  • Built an object detection model focused on classifying ore types in mining logistics, identifying lumps, fines, and empty trucks from images. This AI solution helped optimize material tracking and improved operational efficiency.

Tech Stack: Python, Label Studio, OpenCV, YOLO V8, GitHub Action, ECR, EC2, Docker Container.

Mineral ASP Forecasting:

  • Built a predictive analytics model for forecasting quarterly average sales prices of iron ore based on mineral grade. This enabled data-driven insights for pricing strategies, and improved market forecasting accuracy.

Tech Stack: Python, GitHub, EDA, Statistical Analysis, Feature Engineering, Outlier Detection, Parametric/Non-parametric ML Algorithm, MLFlow, Container Registry, WebApp.

Data Scientist

Circle HD
Chennai
07.2018 - 04.2021

Client: OKTA, Recommendation System

  • Built an AI-powered training recommendation engine using collaborative filtering and NLP to personalize learning paths for employees based on historical participation and interest areas.

Tech Stack: Python, NLP, TfidfVectorizer, NLTK, SVD.

Senior Software Engineer

CGI
Chennai
03.2014 - 06.2018

Client: Northern Power Grid (NPG)

  • Actively handled CR (Change Requests) and IM (Incident Management) tickets, resolving high-priority (P1) application failures within the NPG environment. Investigated recurring issues through problem ticket analysis to identify root causes. Served as the application owner for key BC1 systems, including MPRS, CNDB, Timeseries, and OMS.

Tech Stack: Oracle 10g, SQL, PL/SQL, UNIX, and CosBatch Job Scheduler, EQ design, Application Management, Service Delivery.

Senior Software Engineer

Logica
Chennai
12.2008 - 03.2014

Client: National Bank Of Paris (BNPP)

  • Defining and modifying the backend Oracle Database according to the credit risk management rules established by the bank to uphold financial stability.

Tech Stack: Oracle 10g, UNIX, HP Quality Center, SQL, PL/SQL, HLD/LLD design, impact assessment, and price estimation.

Software Engineer

3i Infotech
Chennai
02.2007 - 12.2008

Client: Hallmark National Insurance Company, ALAMANCE

  • Worked extensively in the U.S. General Insurance domain, focusing on core business areas such as underwriting, quotation, billing, and claims processing. Developed PL/SQL functions, cursors, and packages to implement List of Values (LOVs) and external validations based on client requirements.

Tech Stack: Oracle 9i, Forms 9i, SQL, PL/SQL, UNIX.

Education

'O' Level

DOEACC

'A' Level

DOEACC

Bachelors of Computer Application

IGNOU

Masters of Computer Application

IGNOU

Skills

Programming Languages and Tools: Python (Scikit-learn, Pandas, NumPy, Matplotlib, SciPy, TensorFlow), PL/SQL

Statistical Tests: Statistical Analysis, EDA, Central Limit Theorem, Anomaly Detection, Parameter Estimation, PCA, SVD, Z-stats, T-stats, Confusion Matrix, Tolerance, VIF, Z-score, Rank Correlation, AUC-ROC Curve, Hypothesis Testing, A/B Testing, BOX-COX Transformation

Machine Learning Algorithms: Linear and Logistic Regression, SVM, Decision Trees, Random Forest, KNN, Naïve Bayes, K-Means, DBSCAN, Gradient Boosting, XGBoost

Deep Learning: ANN, CNN, RNN, TensorFlow

Generative AI: LangChain, RAG, Prompt Design, Fine-Tuning (LoRA, QLoRA, RAFT), BLEU

Image processing: PIL, Pillow, YOLO

Database: Oracle, MySQL, MongoDB, Chroma DB, Weaviate, Pinecone, Neo4j

Text Analytics: NLP, NLTK, SpaCy

Tools: GitHub, GitLab, Docker Hub, DVC, and MLflow

Cloud Platform: Azure, AWS, EBS, ECR, Container Registry, Azure Data Fabric

Certification

  • Certification in Machine Learning (Coursera)
  • Oracle Certified Associate
  • PMP - PMI Aspirant

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

AI Analyst Consultant

Qualfon
12.2024 - 02.2025

Data Scientist

CSM Technology
06.2021 - 05.2024

Data Scientist

Circle HD
07.2018 - 04.2021

Senior Software Engineer

CGI
03.2014 - 06.2018

Senior Software Engineer

Logica
12.2008 - 03.2014

Software Engineer

3i Infotech
02.2007 - 12.2008

'O' Level

DOEACC

'A' Level

DOEACC

Bachelors of Computer Application

IGNOU

Masters of Computer Application

IGNOU
Sanjay Kumar