Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Accomplishments
Timeline
Generic

Prabhat Shukla

Bhopal

Summary

Data science expert with over 9 years of dedicated experience in data science, contributing to more than 12 years of overall professional experience. Specializing in machine learning, NLP, and predictive modeling, I excel at converting complex datasets into actionable insights that enhance decision-making and business strategies. Proficient in Python, R, SQL, and leading data visualization tools, I have a proven track record of improving operational efficiency and driving revenue growth. Skilled in stakeholder management and leading cross-functional teams, with experience collaborating directly with third-party partners to integrate cutting-edge AI solutions.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Sr. Data Scientist

Yash Technologies Pvt Ltd
Pune
02.2020 - Current

Client: John Deere

  • Automated Warranty Responsibility Code Assignment: Designed and deployed a hybrid ML-driven solution leveraging Large Language Models (LLMs) to automatically assign responsibility codes to warranty claims (Supplier/Deere liability). The LLMs analyzed the complaint cause correction text along with the entire warranty worksheet data to accurately predict the responsible party.
    Impact: Recovered over $55 million from previously rejected or pending claims, significantly increasing the claim acceptance rate and reducing rejections due to accurate and automated responsibility assignments.
  • Machine Translation Quality Evaluation Using LLMs: Developed an automated system leveraging LLMs (ChatGPT) and Galileo evaluation to assess machine-translated text quality. The system's prompts were enriched with metadata from a vector database, including approved terminology and Deere-specific style guides, ensuring adherence to brand and domain standards. Evaluated translations across fluency, accuracy, terminology, style, and local conventions, categorizing quality and severity (major, minor, critical, neutral).
    Impact: Phase 1 achieved a 20% reduction in manual effort from linguistics experts, with a target to reduce effort by at least 50% to achieve significant cost savings in translation quality assessment.
  • Enhancing Extended Warranty Data in Palantir Foundry: Improved the quality, accessibility, and actionable insights from extended warranty data by utilizing Palantir Foundry's capabilities. Designed and implemented robust data ingestion (ETL) pipelines using Palantir Foundry and PySpark for complex data enrichment.
    Impact: Enhanced data quality and accessibility for key stakeholders, leading to more informed decision-making and a 25% reduction in time-to-insight for warranty claims analysis.
  • Attachments, Parts, Forecasting (Time Series): Implemented an ML-driven time series forecasting mechanism for over 2477+ attachment parts. The models (Prophet, ARIMA, Holt-Winter) were developed considering factors such as historic sales, base-coded, and non-base-coded attachments, with performance evaluated using RMSE and MAPE. This initiative aimed to improve overall forecast accuracy and develop a new strategy to feed these forecasts into the Advanced Planning and Optimization (APO) system, adapting to changing business scenarios.
    Impact: Improved Attachment Forecast accuracy (achieving 90% forecast accuracy), leading to a 15% reduction in inventory discrepancies, and establishing robust new strategies for seamlessly integrating forecasts into the APO system.
  • Automated Reporting and Data Pipelines (Power BI & Power Apps): Built automated data collection and cleaning pipelines using Databricks to streamline various reporting workflows across operations.
    Impact: Reduced manual reporting efforts by 40% and improved data freshness, enabling quicker access to critical business insights through timely and accurate dashboards.
    Key Initiatives under this program include:
    Automated Order Response and Execution Report:
    Developed an automated weekly Power BI report to monitor the "hit and miss" of invoice orders, providing insights through multiple filter types (slicers, dropdowns, multiple tabs).
    Impact: Saved ~450 human hours annually and ensured on-time delivery of critical data for proactive decision-making and action.
    Plant Site Logistics Shipping Report: Automated a Power BI report for plant site logistics, identifying machines to be picked and shipped. Features include on-demand CSV export and flagging orders without serial numbers or shipping modes.
    Impact: Automated reporting saved ~350 human hours annually by streamlining logistics monitoring.
    Automated Factory Delivery Date Report of Finished Goods: Created an automated Power BI report with scheduled refresh capabilities, serving as a Key Performance Indicator (KPI) for factory performance. This report helps monitor factory delivery date volatility.
    Impact: Saved ~300 human hours annually and provided crucial insights for monitoring and managing factory delivery timelines.

Client: PALL Corporation

  • Flowstar (Filter Integrity Test Instrument): Visual and Predictive Analytics: Created data pipelines to collect, clean, and feature-engineer XML log files from Flowstar machines. Developed visual reporting and predictive analytics solutions, including an ML model (Scikit-learn), to predict integrity test outcomes at an early stage.
    Impact: Improved operational insights, enabled proactive issue identification for medicinal drug manufacturing, and reduced potential production downtime by 10% by predicting integrity test failures.

Sr. Data Scientist

Accenture Solutions Pvt Ltd
Gurgaon
07.2018 - 02.2020

Client: Google, Inc.

  • Overall Contribution (Google Retail): Applied machine learning and statistical modeling to solve complex business problems in Google Retail, translating insights into actionable recommendations. Built and deployed prototype solutions using Google Cloud applications and Python scripting.
  • Market Basket Analysis: Generated association rules (Apriori algorithm) to link products, identifying high-confidence relationships in customer purchasing patterns.
    Impact: Provided actionable insights for cross-selling strategies, product bundling, and optimizing product placements to enhance sales.
  • Email Sent Time Optimization: Developed and deployed regression and classification models to predict optimal email send times.
    Impact: Maximized email open rates and click-through rates, improving campaign effectiveness and customer engagement.
  • Segmentation of Site Visitors Using Spark: Developed K-means clustering models in PySpark to segment website visitor data, identifying distinct and natural user types.
    Impact: Enabled the creation of highly targeted marketing campaigns and personalized user experiences based on identified visitor behaviors and preferences.

Data Scientist

Northout Solutions
Indore
12.2017 - 07.2018

Client: John Hancock Financial.

Spending Habits Analysis

  • Description: Analyzed user spending habits and transactional behavior from bank and credit card data, integrating demographic and lifestyle information. Created visualizations and predicted user transactions for upcoming months using ARIMA and regression models.
  • Impact: Provided clients with deeper insights into their financial behavior, enabling more effective budgeting, personalized financial advice, and improved long-term financial planning.

Machine Learning Engineer

Bonsmat Group
Ludhiana
09.2017 - 11.2017

ChatBot Assistant (bgpay.in)

  • Description: Developed a conversational dialogue system for mobile recharge offers and services, accessible via Rest API. Incorporated advanced features such as weather forecasting, news summarization, and a comprehensive question-answering system. The solution involved building and deploying ML models for robust text classification and Named Entity Recognition (NER).
  • Impact: Enhanced user engagement and provided instant access to information and services, streamlining the user experience for mobile recharge offers and beyond.

Data Engineer

Constalytics
Mohali
04.2017 - 08.2017

Knowledge Graph Platform

  • Description: Designed and developed an unstructured text data processing platform capable of name entity extraction, topic modeling, and sentiment analysis. The platform integrated Neo4j for robust knowledge graph creation, enabling the extraction and visualization of complex relationships between entities. Custom ML models were developed for advanced text classification and Named Entity Recognition (NER).
  • Impact: Transformed raw, unstructured text into actionable, interconnected insights, significantly improving data discovery, relationship analysis, and enabling more informed decision-making from large volumes of textual information.

Machine Learning Researcher

Data Science Research Institute
Bengaluru
08.2016 - 03.2017
  • Cluster Analysis Using Spark: Performed cluster analysis on weather data for California (2011-2014) using the K-means algorithm in Spark to identify significant patterns and groupings.
  • Cricket Prediction and Analysis: Scraped extensive player data and developed a predictive model utilizing the CricketR package in R to analyze and forecast player performance.

Software Developer

Predictive Research
Bengaluru
03.2015 - 08.2016

Contributed to diverse software development projects, including coding, testing, and feature implementation to achieve overall project objectives.

Web Developer

Freelancer
Bhopal
01.2012 - 02.2015

Architected and upheld websites while working as freelance web developer.

Education

PGD - Big Data Analytics

Siddaganga Institute of Technology
Tumkur
01.2017

B.E. - Computer Science

RKDF College of Engineering
Bhopal
01.2011

Skills

Programming & Fundamentals

  • Python and R Programming
  • Python Libraries: Pandas, NumPy
  • SQL (MySQL, DB2)
  • Version Control: Git, GitHub

Data Engineering & Big Data

  • Big Data Processing (PySpark)
  • Distributed Computing (Dask)
  • Real-time Data Streaming
  • Databricks/Delta Lake

Core AI/ML & Analytics

  • Statistical Analysis
  • Data Mining Techniques
  • Machine Learning Techniques
  • Deep Learning & Neural Networks
  • Natural Language Processing
  • Time Series Analysis

Generative AI & AI Agents

  • Generative AI Applications
  • Large Language Model (LLM) APIs & Models (GPT series, Gemini)
  • Vector Databases (Qdrant)
  • Hugging Face Transformers
  • Prompt Engineering
  • Model Fine-tuning
  • Retrieval Augmented Generation (RAG)

Enterprise AI Platforms & MLOps

  • C3ai (C3 AI Platform, C3 AI Applications)
  • Palantir (Foundry, AIP)

Data Visualization & BI

  • BI Tools: Power BI, Tableau
  • Interactive Dashboarding: Streamlit
  • Python Visualization Libraries: Matplotlib, Seaborn, Plotly/Dash

Cloud Platforms & Deployment

  • Cloud Providers: AWS, GCP
  • Containerization & Orchestration: Docker, Kubernetes
  • Infrastructure as Code (IaC): Terraform
  • CI/CD for MLOps (eg, GitHub Actions)
  • Cloud - services: EC2, S3, Lambda

Certification

  • C3.AI V8 data science
  • Deep Learning Specialization (deeplearning.ai, Andrew Ng)
  • Machine learning with big data (University of California, San Diego - Coursera)
  • Graph analytics for big data (University of California, San Diego - Coursera)

Languages

Hindi
First Language
English
Proficient (C2)
C2

Accomplishments

  • Star Achiever Award
  • Prime Player Award

Timeline

Sr. Data Scientist

Yash Technologies Pvt Ltd
02.2020 - Current

Sr. Data Scientist

Accenture Solutions Pvt Ltd
07.2018 - 02.2020

Data Scientist

Northout Solutions
12.2017 - 07.2018

Machine Learning Engineer

Bonsmat Group
09.2017 - 11.2017

Data Engineer

Constalytics
04.2017 - 08.2017

Machine Learning Researcher

Data Science Research Institute
08.2016 - 03.2017

Software Developer

Predictive Research
03.2015 - 08.2016

Web Developer

Freelancer
01.2012 - 02.2015

PGD - Big Data Analytics

Siddaganga Institute of Technology

B.E. - Computer Science

RKDF College of Engineering
Prabhat Shukla