Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic

DEEPAK PANDEY

New Delhi

Summary

Meticulous Data Scientist accomplished in compiling, transforming and analysing complex information through software. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems. Around six years of AI ML application development experience with major highlights:

  • Experience in working with open-source and managed LLMs like Open AI and LLAMA 3
  • Proficient in predictive modelling, data processing, GenAI and ML algorithm implementation
  • Expertise in scripting language Python, Pyspark and knowledge of SQL coding
  • Certified with AWS-certified Machine Learning - Speciality

Overview

5
5
years of professional experience
1
1
Certification

Work History

DocDecipher Product

QBurst
07.2023 - Current

The project is to implement document parsing service to extract relevant information from different kinds of documents like CVs, JDs. etc. The project uses combination of vision-based approaches and natural language processing to make sense of document.

Technology Used : Python, spaCy, NLTK, OpenCV, Tessaract, PaddleOCR, Machine Learning, Deep Learning, Natural Language Processing, Large Language models, Langchain, AWS Services, GenAI

Role and Responsibilities:

  • Implemented PDF parsing using Tesseract and Paddle OCR.
  • Finetuned Bert and Flair model, and Improved accuracy of Education and Experience section.
  • Improved accuracy using Open AI and LLAMA3as the operational layer instead of legacy models.
  • Implemented a modular API integration architect for seamless inference services like HuggingFace
  • Finetuned LLM Models using PEFT like LoRA.

Composite Modernization

Capgemini India Pvt Ltd
07.2022 - 03.2023

The goal of this project is to perform ETL tasks which include Data Compression, Data Formatting, and Data Versioning at different stages in the cloud and on premise data stores. It ensures Right To Be Forgotten Policy in accordance with the governmental regulations.

Technology Used: Python, PySpark, AWS Services, PostgreSQL, Snowflake, UC4 Automic, Privacera

Role and Responsibilities:

●Performed ETL tasks using AWS Glue and Batch Jobs.

● Implemented Data Compression, Data formatting, Data Versioning at

different stages in cloud and On-premise data store.

● Implemented Right to be forgotten policy.

● Automated the entire workflow using UC4 Jobs, workflows, and scheduler.

DNS Tunneling - Cyber Security

Capgemini India Pvt Ltd
11.2021 - 07.2022

The scope of the project is to analyze the DNS queries to determine if the query is malicious or not, using different Machine Learning classification models.

Technology Used: Python, ML, NLP, AWS Services

Role and Responsibilities:

● Performed Data Collection from Splunk logs.

● Extracted and analysed features from the queries.

● Implemented classification models to classify the queries as normal and malicious.

Anomaly Detection (PowerShell Scripts)

Capgemini India Pvt Ltd
10.2020 - 09.2021

This project focuses on detecting anomalies in PowerShell scripts by leveraging clustering models.

Technology Used: Python, ML, Clustering, AWS Services

Role and Responsibilities:

● Leveraged clustering models to identify rare PowerShell scripts as potential threats.

● Employed TFIDF for comprehensive feature extraction of scripts.

● Applied K-means clustering to group similar scripts, differentiating anomalies effectively.

● Streamlined monitoring process, decreasing required analysts from 50 to 1 for suspicious script review.

Reputation Intelligence Solution

Capgemini India Pvt Ltd
12.2019 - 09.2020

A customer better solution enables financial institutions to assess an organization’s

reputation with integrated multidimensional strategy using consolidated news, reviews,

financial factors for accurate score analysis and reporting for direct competitor analysis

between organizations within the same sector.

Technology Used: Python, NLP, Deep Learning, AWS S3, AWS Sagemaker

Role and Responsibilities:

● Extracted data from Twitter and Reuters news headlines.

● Cleaning and preprocessing of extracted datasets.

● Implemented reputation score calculation for organizations.

● Implemented graphs to show performance of an organisation with their competitor within the same sector.

Education

Bachelor of Technology - Electronics And Communications Engineering

National Institute of Technology
Jamshedpur, India
05.2019

Skills

  • Machine learning and deep learning
  • Data analysis and visualization
  • Natural language processing
  • Feature extraction and document parsing
  • Predictive modeling and statistical analysis
  • Neural networks and frameworks
  • API development and integration
  • Version control systems
  • Cloud computing and AWS services
  • Containerization with Docker and Kubernetes
  • Web frameworks: Django and Flask
  • Data libraries: Pandas, NumPy, scikit-learn, seaborn, matplotlib
  • Frameworks: PyTorch, TensorFlow, LangChain

Accomplishments

Global Data Science Challenge - 2023

Python, Faster R-CNN, Cascade R-CNN, AWS (SageMaker, S3)

  • Winner among 750 teams participating from 33 countries
  • Developed an AI-based solution to automate worm detection in river blindness clinical trials
  • Targeted detection of worm sections in microscopic nodule images
  • Used CNNs with ResNet-101 backbones
  • Finetuned models pre-trained on the COCO dataset for object detection

Certification

  • AWS Machine Learning Specialty Certified
  • Deep Learning Specialization, Deeplearning.ai, Coursera
  • Machine Learning with Python, IBM, Coursera
  • Data Analysis with Python, IBM, COursera

Timeline

DocDecipher Product

QBurst
07.2023 - Current

Composite Modernization

Capgemini India Pvt Ltd
07.2022 - 03.2023

DNS Tunneling - Cyber Security

Capgemini India Pvt Ltd
11.2021 - 07.2022

Anomaly Detection (PowerShell Scripts)

Capgemini India Pvt Ltd
10.2020 - 09.2021

Reputation Intelligence Solution

Capgemini India Pvt Ltd
12.2019 - 09.2020

Bachelor of Technology - Electronics And Communications Engineering

National Institute of Technology
DEEPAK PANDEY