Summary
Overview
Work History
Education
Accomplishments
Additional Information
Timeline
Research paper & Publications
Hi, I’m

Anupam Poddar

Data Science Engineer
Bengaluru
Anupam Poddar

Summary

Portfolio_link:

https://anupampoddar1.wixsite.com/anupam-poddar-data-1/portfolio

Meticulous Data Scientist accomplished in compiling, transforming and analyzing complex information through software. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems.

whatsapp contact - +91 9873835369

Overview

4
years of professional experience
4
years of post-secondary education
5
Certifications
2
Languages

Work History

Docquity(healthcare & Medical- Domain)

Senior Machine Learning Engineer
11.2023 - Current

Job overview

  • Working on LLMS, enhancing and scaling Embedding based Document search method(Retriever Augmented Generation) and fine-tuning with Knowledge graph and key-word search.
  • Pdf parsing into chunks using open source pdf parsers.
  • POCs around Bio-Bert,Sci-spacy and many other hugging face open source models for bio-medical entity extraction like disease, gene,drug etc.
  • Entity ranking with the help Pubmed-Bert,Bio-Bert embeddings for bio-medical context.
  • Creation of dataset using techniques like Para-phrasing,Summarization.
  • Used Prompt Engineering,Lang chain and open-ai api for answer generation.
  • Skills - LLMs, NLP, Fine-tuning, Knowledge Graph,RAG,Chat PDF,Gen AI,Hugging face, BERT,GPT,open-ai,Neo-JS,Vector db,Pinecone,LLama Index, Llama2,Prompt Engineering.Langchain,Pubmed etc.

Vahak(Logistics - Domain)

Data Scientist
11.2022 - 11.2023

Job overview

  • Recommender system for load and lorry and network from scratch.

Automating Cash-Back System for Applications (OCR Techniques)

  • Utilized OCR techniques and AWS Textract to automate cash-back system, reducing manual EWAY bill verifications by over 60%.
  • Successfully extracted RC verified lorry numbers and GST numbers, improving data quality by identifying duplicates and verifying QR codes.
  • Designed and implemented a data pipeline to create a dump of EWAY bills between EC2 machines and S3 buckets.

Fraud Detection and Analysis (OCR Techniques)

  • Implemented OCR techniques and AWS Recognition to identify fraudulent users by analyzing QR code duplicates and identifying multiple identical EWAY bills.
  • Conducted in-depth analysis of more than 100 call recordings to identify patterns of fraudsters and developed a hypothesis for fraud detection.

Call Recording Transcription Analysis (Data Analysis)

  • Collected and analyzed around 50,000 call recordings separately for potential fraud users and power users to derive actionable insights.
  • Utilized data analysis techniques to identify patterns and behaviors of fraudsters, leading to improved fraud detection and prevention measures.

Real-time Inflow Issue Detection (Data Automation)

  • Developed a script to automatically identify inflow issues in important production tables and notify DevOps teams for real-time rectification.
  • Successfully scheduled and deployed the script within a day, ensuring smooth data flow and data integrity.

Data Pipelines (AWS, BigQuery, Redshift)

  • Created various data pipelines for moving data between BigQuery, Redshift, and S3, enabling seamless data transfer and analysis.
  • Designed and implemented a pipeline for collecting call data, improving data accessibility and analysis capabilities.

Aadhar Card Number Masking (Object Detection)

  • Leveraged OpenCV and YOLOv8 model architecture to develop an object detection model for masking Aadhar card numbers in images.
  • Achieved 100% mean average precision (mAP) for the test set, ensuring data privacy and compliance.

Predictive Modeling and Customer Support Automation

  • Utilized BigQuery ML to predict user behavior based on the first 6 and 48 hours of activity, enabling targeted campaigns and better customer engagement.
  • Integrated app scripts for customer support, replacing CRM systems for efficient calling and campaign management.
  • Extracted and assessed data from databases to drive improvement of product development and business strategies and processes.

Orange Business Services Pvt Ltd(Telecomm-domain)

Machine Learning/Data Science Engineer
08.2021 - 11.2022

Job overview

  • OCR techniques to extract text from warning/error screenshots ongoing applications and applying NLP techniques to remove unwanted symbols and terms and processing it for AUTO heal application to identify error and fix itself.
  • Creating rest API with help of flask for predicting diseased olive crops using YOLO model, from labeling and training model to predict it.
  • Building an Application to automate Jira test case using techniques custom NER and meaning generation with pos tagging[POC] and development & authentication of complete application using flask API.[OBS]

Nagarro Inc( Consulting)

Software Engineer
01.2021 - 06.2021

Job overview

  • Performed sentiment analysis of website texts and built a web scraper using Python, which helped fine-tune market-strategy
  • Developed a loan default model using various ML algorithms on a loan book ; achieved best precision scores .
  • Performed extensive documentation of Elasticsearch and implemented complex queries and aggregations to form summaries

Pinna.ai(Voice & Transcription Domain)

AI/ML Developer(Intern)
05.2020 - 01.2021

Job overview

  • Creating End to End Pipeline of current model, Including new Functionalities for product.
  • Increased accuracy of Speech recognition model.
  • Creating End to End Pipeline of current model, Including new Functionalities for product
  • One word Detection Engine (KERAS Model) [PINNA.AI](9 months)

Edgistify(Warehousing & Supply-chain)

Data Science Intern
03.2020 - 04.2020

Job overview

  • Performed Web scraping of godown data all over India state-wise, Retail sh ops address scraping from Google maps data, using Google map find landmarks, distance using geographical information and for various purposes.[EDGISTIFY]
  • Collecting Godown, warehouse information from newspaper & social media ads using open vision techniques, labeling and categorizing companies/retail shops, Data cleaning using pandas.[EDGISTIFY] (2 months)

College

Student Researcher
01.2016 - 01.2020

Job overview

Data Science Projects:

  • Face-Recognition Attendance System: Leveraged OpenCV and Python to build system that uses facial parameters (e.g., distance between eyelids, cheekbones) to recognize faces and track attendance.
  • Skillset Ontology: Developed machine learning model that categorizes projects based on students' technical and non-technical skills, promoting mutual learning. This project incorporated NLP, NLTK, formal concept analysis, and web scraping (Selenium, BeautifulSoup, Mechanize). research paper was published based on this work.
  • Fine-tuned Roberta on IMDB dataset for sentiment analysis using PyTorch and achieved best accuracy score of 0.92 • Developed an LSTM sentiment analysis model on IMDB dataset with parts of speech tags and improved accuracy by 4%
  • Fine-tuned a Multi-Label Deep Neural Network on top 5 snapshots for relation prediction; developed baseline hits@1 score Garment-search-Engine: Using Feature likelihood method
  • Hangman AI - Using LSTM neural net trained model on 250000 words to predict next word.

Education

Netaji Subash Institute of Technology.
New Delhi

B.Tech from Information Technology
08.2016 - 08.2020

University Overview

University: Delhi University

Score: 7.4(1st Division),secured 176 credits,Required credits-168 credits


Accomplishments

Accomplishments


  • Certified in Python,C,C++ programming language
  • Competitive coding training in Coding Mafia, 20+ Technical Certifications on Linkedin Learning, Sololearn certifications (Python, CSS,JAVA,Python,C++,C,JavaScript,HTML,JQuery),SQL(cert.)
  • Specialized in Technical skills such as Computer Vision(Inception v3,VGG-16, VGG-19,Transfer learning,GANs),NLP(LSTM,RNN,BERT, Transformers,GPT-2,3),IOT.
  • Published research paper on Technical and Non-Technical Skills from Project Description and matching into Team of Students
  • Received best Employee in Data for Automating Cashback.

SKILLS:

Languages: SQL, C++, Python | Techniques & Tools: Machine learning,NLP, Deep learning, Keras, PyTorch, Artificial, OCR Techniques, Data Pipelines, AWS Textract, AWS Recognition, Google Cloud BigQuery ML, Python, OpenCV, Object Detection, Data Analysis, Fraud Detection, Data Automation, Customer Support Automation.

Intelligence,Computer Vision Operating Systems: Mac, Linux | Frameworks & Tools: Snowflake, Dataiku, Flask, Elasticsearch, Kibana, Excel, PowerPoint, Git,Bigquery,Redshift,Ec2-machine

Additional Information

Additional Information
  • Geekforgeeks profile: https://auth.geeksforgeeks.org/user/champgamy/profile
  • Codechef profile: https://www.codechef.com/users/anupamp11
  • Leetcode profile: https://leetcode.com/anupamking01/
  • Hackerrank profile: https://www.hackerrank.com/anupampoddar1997
  • Github link: https://github.com/anupamking01/
  • Research paper link: https://drive.google.com/drive/folders/1TDZF0R6WqRZVSH469GOfmTlANqfp4A-X?usp=sharing
  • Certificates link: https://drive.google.com/drive/folders/1Q9E4g6cW3UR6QDN-dkId6tt_08j53bYU?usp=sharing


Timeline

Senior Machine Learning Engineer
Docquity(healthcare & Medical- Domain)
11.2023 - Current
Data Scientist
Vahak(Logistics - Domain)
11.2022 - 11.2023
Machine Learning/Data Science Engineer
Orange Business Services Pvt Ltd(Telecomm-domain)
08.2021 - 11.2022
Software Engineer
Nagarro Inc( Consulting)
01.2021 - 06.2021
AI/ML Developer(Intern)
Pinna.ai(Voice & Transcription Domain)
05.2020 - 01.2021
Data Science Intern
Edgistify(Warehousing & Supply-chain)
03.2020 - 04.2020
  • AI/ML
01-2020
  • Data Structures & Algo
08-2019
  • Python
03-2019
  • SQL
02-2019
  • DBMS
08-2017
Netaji Subash Institute of Technology.
B.Tech from Information Technology
08.2016 - 08.2020
Student Researcher
College
01.2016 - 01.2020

Research paper & Publications

Research paper & Publications

Research Paper Published on Springer International Publishing, ResearchGate,Semantic Scholar - 

https://link.springer.com/chapter/10.1007/978-981-15-1366-4_14 

https://www.researchgate.net/publication/339489475_Extraction_of_Technical_and_Non-technical_Skills_for_Optimal_Project-Team_Allocation https://www.semanticscholar.org/paper/Extraction-of-Technical-and-Non-technical-Skills-Bhatia-Chakraverty/411cbb7fc204f5028d09cbeaa70a95216b1fa5c0

Anupam PoddarData Science Engineer