Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Jishnu  Sarkar

Jishnu Sarkar

AI/ML Computational Science Specialist
Kolkata

Summary

Experienced Senior Data Scientist with 7 years of proven expertise in developing advanced AI solutions, now specializing in generative AI. Passionate about leveraging cutting-edge technologies to drive innovation and create impactful applications. Seeking to contribute deep technical knowledge and creative problem-solving skills to a forward-thinking organization aiming to push the boundaries of AI and machine learning. Looking to collaborate with diverse teams to design and deploy transformative AI models that deliver tangible value to users and stakeholders.

Overview

7
7
years of professional experience
1
1
Certification

Work History

AI/ML Computational Science Specialist

Accenture
08.2018 - Current

Knowledge Brain – Sensitive Information Extraction Using RAG from Contract Documents.

This project focuses on extracting sensitive information from contract documents using Retrieval-Augmented Generation (RAG). It is structured into two main components: ingestion and retrieval.

  • Ingestion Phase

Contract documents are ingested from a Google Cloud Storage (GCS) bucket. The ingested data is stored in two databases:

AlloyDB: Stores structured document information for fast querying.

Spanner Graph Database: Used to extract entities and relationships to build an ontology that represents the document's semantic structure.

  • Retrieval Phase

In this phase, RAG-based pipelines are used to extract answers to specific queries. The system leverages:

(i) LLM prompting to generate accurate responses.

(ii) Data from both AlloyDB and the Spanner graph to provide context-aware, semantically rich answers based on the ingested contract data.




Extract appropriate answers to a list of questions from an insurance proposal document using the RAG (Retrieval-Augmented Generation) method


  • A list of questions is provided in the questionnaire. The task is to find answers to these questions from the proposal document.
  • Generated embedding vectors for the proposal document using the Ada embeddings model and stored them in the Chromadb vector database.
  • Retrieved the top 10 document chunks from the vector database based on the input query, then passed the input query, document chunks, and prompt through the LLM module to obtain the appropriate answer.


Anonymization engine for data masking


  • There is a need to remove confidential and personal information from documents so the data can be shared securely with 3rd parties for sending the data for cloud processing, releasing the open dataset and 3rd party collaboration.
  • Used google NER module to extract various entity like Organization, Location, Address etc. Develop different regular expression mechanism to extract email, phone number, RFP number etc.
  • Built various masking technique and custom black list based on the user given configuration for document masking.
  • Used generative AI mechanism to reduce the false positive data in extracted output.


Use Generative AI algorithm to get the proper feedback labels from given feedback text


  • Fine-tuned various prompts to obtain appropriate feedback labels from given input feedback text using generative AI algorithm(Llama).
  • Utilized different post-processing methods to achieve accurate outcome labels.




Analytics Advisory Senior Analyst

Accenture
12.2020 - 12.2022

Agent assist for marketing domain


  • The objective of this work is to empower sales agents with important information needed during the sales process which helps them to close on more sales. Also, create more sales capacity by releasing the agents from non-critical tasks.
  • Created various widgets like (industry prediction, trending, call summary, notes summary) which contains the customer information. Those widgets help agent in gaining more clarity on customer information.


MIA(Market Intelligence Assist)


  • MIA is always dynamically tailoring messages to the individual buyer and pinpointing industry trends.
  • Developed various insights (budget, authority, timing etc.) which extract from transcript based on taxonomy. Business can use these extracted insights to understand the larger industry trends.


Concierge for life science domain


Life science concierge aims to automate the categorization and extraction of specified data from health authority correspondence using a machine learning , deep learning and CRF approach. This will also enable the automated updating of a specific framework based on the correspondence.

Analytics Advisory Analyst

Accenture
08.2018 - 12.2020

Autosuggestion tool development for HR domain leave policy data


  • Developed a generic auto-suggestion tool that uses a transformer model (Sentence-BERT encoder) to identify the top five most relevant utterances related to human resource leave policy data.
  • Created UI to showcase the auto-suggestion utterances.


QA Virtual Assistant to troubleshoot HR domain leave policy data & label prediction of HR data


  • Developed a question-answer chatbot to provide information about leave balances in a service-based company. Use both google dialogflow and RASA NLU platforms for development. In google dialogflow, chatbot is created using multiple languages(english, french, german, spanish).
  • Predicted class labels of various HR data using transformer model(BERT, RoBERTa, DistilBERT) for multiple languages like (english, spanish, german, french).
  • Generated paraphrase for various intents using masked language model, intermediate language generation(MarianMT) , T5 and gpt2.


Virtual Assistant for telecommunication domain to handle customer quarries


  • Managed the entire process for implementing a virtual assistant in the telecommunications domain using the RASA-NLU framework, providing a variety of business to business solutions. In the POC phrase the VA was developed using google dialogflow.


  • Developed intents, entities, custom logic, and various paraphrases to address a wide range of customer inquiries.


  • Managed complex business logic using RASA, creating distinct story flows to maintain contextual understanding for various situations through interactive learning.


Education

Master of Science - Big Data Analytics

Ramakrishna Mission Vivekananda University
Howrah, India
04.2001 -

B.Tech - IT

Murshidabad College of Engineering & Technology
Murshidabad, India
04.2001 -

Higher Secondary - Science

Lalbagh Singhi High School
Murshidabad, India
04.2001 -

Secondary -

Lalbagh Singhi High School
Murshidabad
04.2001 -

Skills

    Python , R , SQL

    Statistical Analysis

    Machine Learning , Deep Learning , NLP

    Generative AI, Prompt Engineering, Spanner Graph

    Google Dialogflow , RASA

    Docker , Kubernetes, GCP

    Problem Solving , Project Management

Certification

Introduction to Machine Learning in Production, Coursera

Timeline

Introduction to Machine Learning in Production, Coursera

06-2022

Analytics Advisory Senior Analyst

Accenture
12.2020 - 12.2022

AI/ML Computational Science Specialist

Accenture
08.2018 - Current

Analytics Advisory Analyst

Accenture
08.2018 - 12.2020

Master of Science - Big Data Analytics

Ramakrishna Mission Vivekananda University
04.2001 -

B.Tech - IT

Murshidabad College of Engineering & Technology
04.2001 -

Higher Secondary - Science

Lalbagh Singhi High School
04.2001 -

Secondary -

Lalbagh Singhi High School
04.2001 -
Jishnu SarkarAI/ML Computational Science Specialist