Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Timeline
Generic

Dileep Kumar Sahu

Senior Data Scientist
Delhi

Summary

I am a highly skilled Data Scientist with 4 years of experience building and deploying advanced machine learning solutions. My expertise includes classical ML models like regression, classification, clustering, and ensemble methods such as bagging and boosting. I specialize in deep learning for Natural Language Processing (NLP), using architectures like RNN, LSTM, GRU, BERT, SpaCy, and Transformers. I am proficient in deploying scalable solutions using Python, PySpark, and various databases, and have practical experience with engineering tools like Kafka, EMR, EC2, Airflow, Jenkins, S3, and Amazon SageMaker. I am a strong communicator, passionate about problem-solving and delivering actionable insights.

Overview

5
5
years of professional experience

Work History

Senior Data Scientist

Paytm
11.2021 - Current

InfraRAG Insight:

  • A Retrieval-Augmented Generation (RAG) system leveraging sentence-transformer (mpnet-base-v2) embeddings to extract and summarize infrastructure-related insights from wiki pages, enhancing decision-making with precise, context information in real time.

SMS Text Classifier and Parser:

  • Objective was to create customer’s financial profile and leveraging the profile to different teams like personal loan and facilitate bill reminders
  • Optimized data models by implementing machine learning algorithms and advanced statistical techniques.

SMS Parser OLAP:

  • Developed and implemented SMS-parser OLAP solutions, facilitating real-time aggregation and analysis of resume data.
  • Designed and deployed data models for efficient extraction and transformation of SMS resume data into actionable insights.
  • Collaborated with cross-functional teams to integrate SMS-parser OLAP solutions, improving data accessibility and decision-making processes.

SMS Parser Dashboard:

  • Created customized Looker dashboards to visualize key performance indicators (KPIs) and trends, facilitating data-driven decision-making across departments.
  • Implemented advanced data modeling techniques in Lookup to optimize dashboard performance and ensure data accuracy.
  • Collaborated with stakeholders to gather requirements and translate business needs into intuitive Looker dashboard designs, improving user adoption and satisfaction.
  • Provided training and support to users on navigating and interpreting Looker dashboards, empowering teams to leverage insights for strategic planning and operations.

Device Failure Prediction Model:

  • Given historical performance of the device, build an automated model which forecasts the device failure helping quick issue detection and resolution.

Data Room Activity:

  • Aim of this project to extract the data (eg. credit score, utility bill, quarterly transaction count, quarterly transactional volume, etc.) for Bajaj finance on common users base.
  • Optimized spark configuration to process huge volume of historical data on EMR (Elastic MapReduce).
  • Wrote the optimal pyspark script to reduce the execution time as well as reduce the Elastic MapReduce cost(AWS).

PII-Eye:

  • PII Model detect the Personal Identifiable Information (PII) from the structured data i.e. data present in the tabular format.
  • Identify the personal information from the data, and if PII probability is greater than the threshold value T, then declare that column as PII column.
  • Optimized character-level convocational networks (Convects) for text classification.
  • Character-level convocational networks could achieve state-of-the-art or competitive results.

Machine Learning Engineer

Emilence Pvt. Ltd.
08.2020 - 02.2021

User Recommendation System:

  • Recommendation Engine was developed and deployed for the Client in Amazon Web Services (AWS), Suggest the similar user interest.
  • Hybrid User-based collaborative filtering was developed which helped in identifying the best suited users.
  • Recommend the user based on the mutual user implemented using the Network theory.
  • Accurate Recommendations so perform well in A/B testing and increase 60% Engagement rate of users.
  • Designed, implemented and evaluated new models and rapid software prototypes to solve problems in machine learning and systems engineering.

Education

PG Diploma - Data Science and Artificial Intelligence

Indraprastha Institute of Information Technology
New Delhi, India
10.2021

Master of Computer and Applications - Computer Applications

Panjab University
Chandigarh, India
05.2020

Bachelor of Science - Mathematics

Aryabhatta College, University Of Delhi
New Delhi, India
05.2016

Skills

Machine Learning: Classical ML, Regression, Classification,Bagging, Boosting, Time Series, Statistical modeling, Predictive modeling

Deep Learning : NLP and GenAI: RNN, LSTM, GRU ,BERT, Transformers,Generative AI, LLM ,Rag System, Neural networks

Big data analytics: Pyspark, Hadoop , Hive , BigQuery

NLP: Fasttext, Language Modelling, Spacy, Bert

Predictive modeling

Statistical modeling

Natural language processing

Neural networks

Sentiment analysis

Feature engineering

Machine learning

Transfer learning

Anomaly detection

Ensemble methods

Multivariate analysis

Deep learning

Statistical analysis

Python programming

Big data analytics

Optimization techniques

Data mining

Accomplishments

  • Silver Medals in Programming in Java from National Program on Technology Enhanced Learning NPTEL 2019.
  • Demonstrated model Mathematics in everyday life in the state level Jawaharlal Nehru Children’s Science Exhibition-2009.
  • First Prize in Mental Maths at Zonal level.

Timeline

Senior Data Scientist

Paytm
11.2021 - Current

Machine Learning Engineer

Emilence Pvt. Ltd.
08.2020 - 02.2021

PG Diploma - Data Science and Artificial Intelligence

Indraprastha Institute of Information Technology

Master of Computer and Applications - Computer Applications

Panjab University

Bachelor of Science - Mathematics

Aryabhatta College, University Of Delhi
Dileep Kumar SahuSenior Data Scientist