Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
AccountManager
Kailas Chandak

Kailas Chandak

LEAD Data Scientist
Pune

Summary

Dynamic and results-driven Data Scientist with over 11 years of IT experience, specializing in Generative AI, Machine Learning, Natural Language Processing (NLP), and Data Quality Engineering. Strong hands-on expertise in Python, machine learning model building, Generative AI development, data governance, and seamless production deployments. Proven track record of leading end-to-end data science solutions that drive innovation and efficiency across the banking, telecommunications, and manufacturing sectors. Skilled in transforming complex data into actionable insights to support strategic decision-making and enhance business performance.

Overview

11
11
years of professional experience
6
6
Certifications

Work History

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Project: NLP-Based Multi-Agent Chatbot for Project Intelligence.

Project Description: The NLP-Based Project Intelligence Chatbot is an advanced, agent-driven conversational system designed to answer complex natural language queries related to enterprise program and project management data. The chatbot integrates with PTS, JIRA, and CSI systems, enabling unified access to large volumes of project, task, sprint, epic, and application-level information. Using a combination of NLP, generative AI, and multi-agent orchestration, the chatbot interprets user questions, routes them to specialized retrieval agents, and fetches accurate and real-time insights. It can provide high-level summaries, detailed status reports, sprint progress, epic breakdowns, dependency analysis, resource allocation insights, and issue tracking information. The system leverages intelligent summarization models and context-aware response generation to transform raw operational data into meaningful insights. With capabilities such as semantic search, automated aggregation, cross-system mapping, and contextual summarization, the chatbot significantly reduces manual reporting efforts, and helps teams quickly understand project health, risks, and dependencies across multiple tools.

Responsibilities:

  • Architected and developed an LLM-powered multi-agent chatbot capable of answering complex natural-language queries by orchestrating specialized retrieval and reasoning agents across PTS, JIRA, and CSI enterprise systems.
  • Designed an LLM-based intent classification and entity extraction pipeline using transformer models to accurately interpret user queries related to projects, epics, sprints, and application performance.
  • Implemented a Retrieval-Augmented Generation (RAG) pipeline that indexed large volumes of unstructured and structured metadata using vector embeddings (OpenAI, Sentence Transformers) to support deep semantic search across project documentation and operational data.
  • Developed specialized agents (JIRA Agent, PTS Agent, CSI Agent, Summary Agent, Reasoning Agent) using a coordination framework that autonomously determines data sources, retrieves information, performs validation, and composes final responses.
  • Built scalable data ingestion and ETL pipelines for continuous synchronization of enterprise datasets into the vector store and metadata lake, supporting millions of records across tasks, incidents, epics, stories, and application logs.
  • Used transformer-based summarization models (e.g., BART, Pegasus, and LLMs with custom prompting) to generate concise project summaries, cross-system insights, sprint health reports, backlog overviews, and risk assessments.
  • Integrated agentic reasoning workflows to break complex questions into sub-tasks—such as dependency analysis, issue clustering, incident trend detection—and aggregate multi-source insights into accurate, human-readable outputs.
  • Developed semantic alignment and data fusion logic to merge entities across PTS, JIRA, and CSI (e.g., mapping stories to incidents, linking epics to feature IDs), enabling unified project intelligence.
  • Implemented a real-time response engine with caching, hybrid search (keyword + embeddings), and fallback logic to ensure low-latency answers with high retrieval precision.
  • Built comprehensive observability via agent performance analytics, prompt-level monitoring, retrieval accuracy metrics, and LLM hallucination detection safeguards.
  • Ensured enterprise-grade security, authentication, and RBAC for controlled access to project metadata, and sensitive operational information.
  • Authored detailed technical documentation covering LLM prompt chaining, agent responsibilities, vector schema design, API integration layers, and system architecture.

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Project: Data Lake Attribute Recommender.

Project Description: The Data Lake Attribute Recommender is an intelligent system designed to automatically suggest relevant attributes for analytics, reporting, and machine learning use cases by analyzing the structure, metadata, and semantic patterns of datasets stored in the data lake. Using advanced techniques such as metadata profiling, semantic similarity matching, statistical correlation, and embeddings-based similarity (for text-heavy attributes), the tool identifies relationships between datasets and recommends the most meaningful attributes to users. This reduces manual effort in attribute discovery, improves data usability, and accelerates the development of analytical pipelines.

Responsibilities:

  • Architected an enterprise-grade, LLM-powered Attribute Recommender Engine for data lakes using Transformer-based models (BERT, Sentence-BERT, MiniLM, LLaMA-based embeddings) to detect semantic similarity across heterogeneous metadata attributes.
  • Implemented a Retrieval-Augmented Generation (RAG) pipeline to enrich attribute recommendations by combining metadata embeddings, schema context, business glossary definitions, lineage, and historical usage patterns.
  • Designed scalable metadata extraction and processing pipelines using Python, PySpark, Airflow, and Delta Lake to aggregate schema metadata, data profiles, column statistics, and lineage information from structured and semi-structured sources.
  • Engineered a high-performance vector store using FAISS, Milvus, and Pinecone to enable low-latency Approximate Nearest Neighbor (ANN) search for attribute embeddings across millions of data lake attributes.
  • Built a multi-layer recommendation framework, integrating: Semantic similarity (LLM embeddings), Statistical correlation analysis, Metadata quality indicators (null %, cardinality, monotonicity), Business-term alignment from glossary / ontology , Usage frequency & model feature importance signals
  • Developed microservices for real-time attribute recommendation using FastAPI/Flask, enabling seamless integration with internal data catalog and analytics platforms.
  • Integrated domain-specific LLM prompting and fine-tuning to align attribute suggestions with enterprise data governance terminology, naming conventions, and semantic rules.
  • Implemented automated metadata agents capable of interpreting schema changes, detecting new attributes, and auto-generating updated embeddings without manual intervention.
  • Employed MLOps best practices—CI/CD for models, embedding refresh jobs, model versioning with MLflow, and automated deployment pipelines.
  • Conducted iterative validation with SMEs, and implemented reinforcement strategies to incorporate user feedback loops into the ranking and recommendation engine.
  • Optimized LLM inference by leveraging quantization, GPU acceleration, and batching, reducing compute cost while maintaining high semantic accuracy.
  • Delivered detailed technical documentation covering system architecture, vector indexing strategies, model training pipelines, governance alignment, and performance benchmarks.

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Project: Meta Data Analyzer.

Project Description: The Metadata Analyzer Tool is an intelligent, automated system designed to evaluate, validate, and extract insights from metadata across large, complex datasets. It streamlines data governance by profiling datasets, detecting structural inconsistencies, identifying data quality gaps, and ensuring compliance with enterprise data standards. The tool scans multiple data sources—databases, files, APIs, and data lakes—and generates detailed metadata summaries, such as data types, patterns, null distributions, relationships, schema anomalies, and lineage paths. It enhances visibility into data assets, allowing data engineers, analysts, and governance teams to make faster, more informed decisions about data readiness and quality. Featuring rule-based validation, automated profiling, and configurable quality checks, the Metadata Analyzer Tool reduces manual effort, improves accuracy, and strengthens the overall Data Quality Framework. Its insights help organizations maintain clean, trustworthy, and analytics-ready data.

Responsibilities:

  • Led the end-to-end design and development of the Metadata Analyzer Tool to automate metadata extraction, profiling, and validation across enterprise data sources.
  • Analyzed database schemas, data dictionaries, lineage information, and statistical metadata to identify structural inconsistencies and data quality gaps.
  • Built automated pipelines for metadata ingestion from SQL/NoSQL databases, APIs, and file-based sources, enabling seamless integration with the enterprise data ecosystem.
  • Designed and implemented rule-based and ML-based metadata validation frameworks to detect anomalies in data types, formats, cardinality, null patterns, and schema deviations.
  • Performed detailed metadata profiling to generate insights on data completeness, uniqueness, conformity, and integrity.
  • Created dashboards and reporting modules to visualize metadata quality scores, trends, and compliance with data governance policies.
  • Collaborated with business SMEs, data owners, and governance teams to define metadata standards, quality thresholds, and validation rules.
  • Integrated lineage tracking capabilities to map upstream/downstream data flow, and provide visibility into transformation logic and dependencies.
  • Optimized metadata processing workflows for high performance and scalability across large datasets.
  • Conducted root cause analysis for recurring metadata issues, and recommended corrective actions to improve data quality and system reliability.
  • Ensured alignment with enterprise data governance frameworks, regulatory requirements, and audit standards.
  • Documented metadata extraction logic, profiling rules, validation flows, and design specifications for cross-functional teams.

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Project: LC Advising Tool - HSN Code Classifier.

Project Description: The HSN Code Classifier is an NLP-driven machine learning system designed to automatically identify the correct Harmonized System of Nomenclature (HSN) code for goods described in TAG 45 of the SWIFT MT 700 Letter of Credit message. TAG 45 often contains unstructured item descriptions that vary significantly in format, terminology, and detail. To solve this, the tool uses a Long Short-Term Memory (LSTM)–based deep learning architecture, trained on domain-specific trade text and historical classification data. The LSTM model processes the sequential patterns and linguistic context within the TAG 45 description, capturing key product attributes, product type indicators, and distinguishing terms to classify the item into one of the 28 active HSN code classes used within the system. By analyzing free-text descriptions, normalizing inconsistent terminology, and learning semantic patterns, the classifier delivers highly accurate HSN predictions, even for complex or ambiguous goods descriptions. The solution reduces manual classification effort in trade finance operations, improves compliance and audit readiness, standardizes item coding across transactions, and accelerates LC processing workflows. The classifier integrates seamlessly with existing trade finance systems, enabling automated HSN validation, duty estimation, risk scoring, and downstream reporting, with minimal human intervention.

Responsibilities:

  • Designed and developed an LSTM-based text classification model to automatically map unstructured goods descriptions from MT700 TAG 45 messages to one of the 28 operational HSN code classes.
  • Built an end-to-end NLP pipeline, including text extraction, cleaning, tokenization, lemmatization, and sequence vectorization, tailored for trade finance terminology.
  • Engineered domain-specific feature representations using word embeddings (Word2Vec/fastText/GloVe), optimized for capturing product semantics and industry vocabulary.
  • Created a labeled dataset by consolidating historical LC data, HSN mappings, and trade documentation, and performed extensive data balancing to address class imbalance across 28 categories.
  • Implemented model training, hyperparameter tuning, and regularization techniques (dropout, early stopping) to improve generalization on noisy, inconsistent TAG 45 descriptions.
  • Designed and executed a comprehensive model evaluation framework using F1-score, confusion matrix, class-level recall, and validation against SME-labeled testing samples.
  • Built a scalable inference engine to integrate the classifier into trade finance workflows, enabling automated HSN assignment during LC processing.
  • Collaborated with compliance, trade operations, and domain SMEs to validate model output, refine label definitions, and ensure regulatory alignment.
  • Implemented explainability features such as token importance scoring and attention-like mechanisms to help users understand model reasoning for each predicted HSN code.
  • Containerized the model and deployed it as a microservice/API for seamless integration with existing systems, and downstream duty estimation or risk assessment modules.
  • Developed monitoring and retraining pipelines to continuously track data drift, classification accuracy, and model performance in production.

Lead Data Scientist

Wipro Technologies
11.2020 - 12.2022

Project: Capacity Planning Tool.

Project Description: The Capacity Planning Tool is an analytics-driven workforce optimization system designed to accurately estimate employee capacity requirements for end-to-end banking operations. The tool analyzes historical transaction volumes, operational workloads, processing times, seasonal demand patterns, and SLA commitments to determine the optimal number of employees needed across departments, functions, and process queues. By leveraging statistical forecasting, workload modeling, and task-level productivity metrics, the system predicts future staffing needs and highlights potential capacity gaps. It provides granular insights at daily, weekly, and monthly levels, supporting proactive resource planning and efficient allocation of teams. The tool enables operations managers to simulate different business scenarios, evaluate the impact of volume spikes, process changes, or new product introductions, and make informed decisions on hiring, cross-training, and load balancing. It improves service delivery, reduces bottlenecks, minimizes overtime, and ensures compliance with regulatory turnaround times, ultimately enhancing operational efficiency across banking functions.

Responsibilities:

  • Designed and developed a workforce capacity planning model to estimate employee requirements across multiple banking operations functions, based on historical transaction volumes and process complexity.
  • Built data pipelines to ingest and consolidate operational datasets, including task volumes, handling times, productivity metrics, SLAs, shift schedules, and exception rates.
  • Created workload forecasting functions using statistical or time-series models to predict future transaction volumes and identify seasonal or cyclical patterns.
  • Defined capacity calculation logic incorporating productivity benchmarks, process-level effort multipliers, SLAs, and buffer factors to derive optimal staffing levels.
  • Collaborated with operations managers and SMEs to understand process flows, map activity types, and calibrate effort estimates for accurate resource modeling.
  • Developed scenario simulation modules that allow business teams to model 'what-if' situations, such as volume surges, new product launches, process re-engineering, or staff reallocation.
  • Built visual dashboards and reports to present capacity trends, workload distribution, utilization metrics, and staffing gaps at the function, process, and team levels.
  • Optimized the tool to support real-time decision-making on staffing, hiring, cross-skilling, load balancing, and shift adjustments.
  • Implemented automated alerts for projected capacity shortfalls, SLA risks, or unexpected deviations in workload patterns.
  • Documented data definitions, calculation logic, forecasting methodology, and operational workflows for stakeholder alignment and audit readiness.

Data Scientist

Infosys Technologies
07.2019 - 11.2020

Project: Telecom Customer Churn Analysis.

Project Description: Customer Churn Analysis performs the task of determining whether a customer will stop using services or continue to use the services provided by BT. It provides a better analysis of parameters that would improve customer retention and help clients plan based on such parameters. It also helps customers in designing their offers for new customers. It also helped customers manage location-based inventory.

Responsibilities:

  • Active participant in the data collection and data cleaning process, used data imputation strategies to fill missing values, and used the IQR technique to eliminate outliers.
  • As the data was imbalanced, we performed resampling of the data using the SMOTE technique.
  • Performed exploratory data analysis to identify hidden patterns in data. Used different graphs to show the impact of features and their correlation.
  • Performed feature transformation, as few columns were not following a normal distribution.
  • Using data analysis provided valuable business insight to the client, which helped them improve their offers and retain customers.
  • Performed encoding techniques to eliminate categorical variables, which can later be fed to the ML model.
  • Removed less-contributing and correlated features using correlations, which improved the accuracy of the model.
  • Used different machine learning models to identify the best-suited model for the use case.
  • Tested and validated the accuracy of each model with the help of a confusion matrix, precision, recall, F1-score, ROC, and AUC.
  • Helped the client improve inventory based on location.

Data Scientist

Infosys Technologies
10.2017 - 06.2019

Project: Air Quality Index

Project Description: As a part of this project, the Air Quality Index was to be predicted, where a huge amount of data was to be fetched from multiple sources by the process of web scraping. I was responsible for pre-processing data, analyzing, and providing predictions by applying several machine learning algorithms, as well as testing and validating all machine learning algorithms. Conducted exploratory data analysis on a quick turnaround project involving data manipulation and analysis, and generated valuable insights.

Responsibilities:

  • Used web scraping to collect data from different sources, and the data cleaning process used a data imputation strategy to fill missing values. I used the IQR technique to get rid of outliers.
  • Performed exploratory data analysis to identify impactful features from the data.
  • Performed feature transformation, as few columns were not following a normal distribution.
  • Performed encoding techniques to eliminate categorical variables, which can later be fed to the ML model.
  • Removed less-contributing and correlated features using correlations, which improved the accuracy of the model.
  • Used different machine learning models to identify the best-suited model for the use case.
  • Tested and validated the accuracy of each model with the help of a confusion matrix, precision, recall, F1-score, ROC, and AUC.

Senior Systems Engineer

Infosys Ltd.
02.2015 - 10.2017

Project: Agile Integration Broker

Project Description: AIB (Agile Integration Broker) is BT's strategic order orchestration system. It is a workflow system that is a pass-through system for orders, where the order journey happens through the AIB tool. It is product model-driven for rapid concept-to-market and is developed following a fully test-driven, agile approach using Java and open-source.

Responsibilities:

  • Extensively used core Java language for development.
  • Responsible for understanding the requirements/story document and developing solutions for the same.
  • Worked on different product developments based on dark fiber technology and SOGEA.
  • Followed test-driven development and Agile methodology for development.
  • Developed an automation tool for testing the developed code.
  • Designed solutions were produced in both delta and baseline formats, with the specific requirements and the entire design so far.
  • It also included creating use cases, class diagrams, and sequence diagrams wherever needed in RSA.
  • We used a BT-specific tool, i.e., STORM, for creating and tracking the requirement.
  • The estimation technique used was FP (function point).
  • Worked on different product developments based on dark fiber technology and SOGEA.
  • Followed test-driven development and Agile methodology for development.
  • Developed an automation tool for testing the developed code.

Education

Bachelor of Engineering - Information Technology

University of Pune
Pune, India
04.2001 -

High School Diploma -

Maharashtra Board of Technical Education, Pune
Pune, India
04.2001 -

No Degree -

Maharashtra Board of SSC And HSC
Latur, India
04.2001 -

No Degree -

Maharashtra Board of SSC And HSC
Latur, India
04.2001 -

Skills

Natural language processing

Accomplishments

Gratitude Award for excellent performance.

Most Valuable Player: For Outstanding Performance and Valuable Contribution in the Team.

Best Team Award: For smooth deployment and handover of deliverable to client.

Appreciated for learning project and new technology (Python and Data Science) in short period of time.

Got appreciation for quickly adoption of Continuous Improvement program for client.

Got 5 times Insta Awards by Peers for valuable contribution Towards Team.

Certification

Certified Scrum Master (CSM) by Scrum Alliance

Timeline

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Lead Data Scientist

Citicorp Services India Private Limited
12.2022 - Current

Lead Data Scientist

Wipro Technologies
11.2020 - 12.2022

Advanced Learning Algorithms by Stanford and Deeplearning.AI

11-2019

Supervised Machine Learning Regression and Classification by Stanford and Deeplearning.AI

11-2019

Machine Learning Specialization by Stanford and Deeplearning.AI from Coursera

11-2019

Unsupervised Learning, Recommenders, Reinforcement Learning by Stanford and Deeplearning.AI

09-2019

Certified Data Scientist by VSkills

08-2019

Data Scientist

Infosys Technologies
07.2019 - 11.2020

Certified Scrum Master (CSM) by Scrum Alliance

06-2018

Data Scientist

Infosys Technologies
10.2017 - 06.2019

Senior Systems Engineer

Infosys Ltd.
02.2015 - 10.2017

Bachelor of Engineering - Information Technology

University of Pune
04.2001 -

High School Diploma -

Maharashtra Board of Technical Education, Pune
04.2001 -

No Degree -

Maharashtra Board of SSC And HSC
04.2001 -

No Degree -

Maharashtra Board of SSC And HSC
04.2001 -
Kailas ChandakLEAD Data Scientist