Summary
Overview
Work History
Education
Skills
Certification
Timeline
AWARDS & ACHIEVEMENTS
RELEVANT MLOPS EXPOSURE
Generic

SIDDHARTH JAIN

Data Scientist
Bengaluru

Summary

Results-driven Data Scientist with over 4 years of experience in designing and implementing scalable machine learning, NLP, and analytics solutions using Python, PySpark, SQL, and cloud technologies. Expertise includes developing enterprise-scale data pipelines, customer analytics platforms, and machine learning models while supporting production deployment workflows. A solid foundation in distributed data processing, feature engineering, statistical analysis, model validation, and end-to-end ML lifecycle management enhances collaboration with Data Science, Engineering, and Business teams to deliver impactful AI solutions. Currently advancing knowledge in Azure, Databricks, MLOps, CI/CD automation, and GenAI applications to remain at the forefront of industry innovations.

Overview

3
3
Certifications
4
4
years of professional experience

Work History

Data Scientist

Allstate India Private Limited
04.2022 - Current

KEY PROJECTS

Customer Effort Score (CES)

  • Designed and developed enterprise-scale Customer Effort Score (CES) solutions to measure customer effort across IVR, Voice, Web, and Mobile channels.
  • Built scalable PySpark pipelines for processing large volumes of customer interaction data.
  • Developed standardized effort-scoring methodologies using interaction duration, transfers, and journey-level metrics.
  • Created reusable data products supporting customer experience analytics and business reporting.
  • Partnered with stakeholders to translate business requirements into scalable analytical solutions.
  • Developed interaction-level and journey-level CES datasets used for downstream analytics and reporting.

Conducted analysis to evaluate relationships between customer effort metrics and payment success outcomes. Tech Stack: Python, PySpark, SQL

  • Developed an end-to-end framework for measuring customer effort across multiple service channels.
  • Created interaction-level and journey-level effort datasets supporting enterprise analytics.
  • Implemented duration standardization methodologies and effort signal engineering techniques.
  • Analyzed relationships between customer effort and payment success outcomes.

Impact

  • Enabled business teams to identify high-effort customer journeys and prioritize customer experience improvements.
  • Delivered standardized CES datasets for downstream reporting and analytics consumption.

Named Entity Recognition (NER) Model in Customer Identification

  • Built machine learning models for extraction of authentication-related entities from unstructured customer conversations.
  • Collaborated with senior data scientists and engineering teams on model packaging and deployment activities.
  • Participated in model validation, testing, deployment readiness reviews, and performance evaluation.
  • Supported deployment workflows in cloud-based environments.
  • Improved structured information extraction capabilities for downstream analytics and reporting systems.

Tech Stack: Python, PySpark, NLP

  • Designed scalable text-processing pipelines for customer interaction analysis.
  • Implemented transcript segmentation and context-aware information extraction workflows.
  • Built reusable authentication detection logic across multiple interaction channels.

Impact

  • Improved automation of authentication analysis workflows.
  • Reduced dependency on manual transcript review processes.

ASC JOURNy – Customer Defection Prediction

Tech Stack: Python, Machine Learning, Random Forest, Deep Learning

Business Objective: Predict customers likely to defect (churn) at the end of their six-month policy term and identify early defectors to support proactive retention campaigns.

  • Developed a customer defection prediction framework supporting customer retention initiatives.
  • Built a baseline Random Forest model using the top 30 predictive features identified through feature importance analysis.
  • Generated baseline customer risk probabilities ("Knowledge Score") used as the foundational churn risk indicator.
  • Developed the JOURNy deep learning model to capture customer lifecycle behavior through event-driven modeling.
  • Modeled sequential customer events such as vehicle additions, driver additions, and policy-related activities occurring during the customer lifecycle.
  • Combined event history with baseline risk scores to dynamically update customer defection probabilities after each significant event.
  • Implemented threshold-based risk identification logic to target high-risk customers before policy expiration.
  • Generated weekly lead files used by business teams to execute customer retention campaigns.
  • Enabled identification of both end-of-term defectors and early defectors (3–4 months prior to policy expiration).

ML Analytics & Data Science Initiatives

  • Conducted statistical analysis and hypothesis testing to evaluate relationships between customer behavior, effort metrics, and business outcomes.
  • Developed automated PySpark workflows reducing manual analysis effort and improving reproducibility.
  • Worked closely with business stakeholders, data scientists, and engineering teams throughout project lifecycles.

Presented analytical findings and technical recommendations to technical and non-technical audience

Education

Bachelor of Engineering - Electrical And Electronics Engineering

RV College of Engineering
Bengaluru, India
06-2020

Skills

Python

PySpark

SQL

Feature Engineering

Data Analysis

Deep Learning

Xgboost Model in ML

Named Entity Recognition (NER) in NLP

Statistical Analysis

Generative AI Fundamentals

MLOps & Cloud

GitHub

Certification

Data Science & Machine Learning Program (DSML)

Timeline

Data Scientist

Allstate India Private Limited
04.2022 - Current

Bachelor of Engineering - Electrical And Electronics Engineering

RV College of Engineering

AWARDS & ACHIEVEMENTS

Star Performer Award – Q3 2024 for delivering high-impact Data Science initiatives., Rising Star Award – Q1 2023 for early contributions to enterprise analytics projects., Recognized for delivering impactful Data Science and Analytics solutions supporting business decision-making., Secured 3rd Rank in Nagaland State Board Examination (Class 10) and felicitated by the Governor.

RELEVANT MLOPS EXPOSURE

  • Experience supporting machine learning deployment workflows in cloud environments.
  • Collaborated across Data Science, Engineering, and Business teams throughout the ML lifecycle.
  • Participated in model validation, deployment readiness reviews, and production support activities.
  • Built scalable feature engineering and data processing pipelines using PySpark.
  • Understanding of model monitoring, retraining workflows, model governance, and production AI system requirements.
  • Familiarity with CI/CD concepts, containerization principles, and deployment best practices.
  • Working knowledge of data drift, model drift, feature engineering pipelines, and MLOps best practices.
SIDDHARTH JAINData Scientist