Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Timeline
Generic

Dileep Kumar Sahu

Senior Data Scientist
Delhi

Summary

I am a highly skilled Data Scientist with 4 years of experience building and deploying advanced machine learning solutions. My expertise includes classical ML models like regression, classification, clustering, and ensemble methods such as bagging and boosting. I specialize in deep learning for Natural Language Processing (NLP), using architectures like RNN, LSTM, GRU, BERT, SpaCy, and Transformers. I am proficient in deploying scalable solutions using Python, PySpark, and various databases, and have practical experience with engineering tools like Kafka, EMR, EC2, Airflow, Jenkins, S3, and Amazon SageMaker. I am a strong communicator, passionate about problem-solving and delivering actionable insights.

Overview

5
5
years of professional experience

Work History

Senior Data Scientist

Paytm
11.2021 - Current

InfraRAG Insight:

  • A Retrieval-Augmented Generation (RAG) system leveraging mistral embeddings to extract and summarize infrastructure-related insights from wiki pages, enhancing decision-making with precise, context information in real time.

SMS Text Classifier and Parser:

  • Objective was to create customer’s financial profile and leveraging the profile to different teams like personal loan and facilitate bill reminders
  • Optimized data models by implementing machine learning algorithms and advanced statistical techniques.

SMS Parser OLAP:

  • Developed and implemented SMS-parser OLAP solutions, facilitating real-time aggregation and analysis of resume data.
  • Designed and deployed data models for efficient extraction and transformation of SMS resume data into actionable insights.
  • Collaborated with cross-functional teams to integrate SMS-parser OLAP solutions, improving data accessibility and decision-making processes.

SMS Parser Dashboard:

  • Created customized Looker dashboards to visualize key performance indicators (KPIs) and trends, facilitating data-driven decision-making across departments.
  • Implemented advanced data modeling techniques in Lookup to optimize dashboard performance and ensure data accuracy.
  • Collaborated with stakeholders to gather requirements and translate business needs into intuitive Looker dashboard designs, improving user adoption and satisfaction.
  • Provided training and support to users on navigating and interpreting Looker dashboards, empowering teams to leverage insights for strategic planning and operations.

Device Failure Prediction Model:

  • Given historical performance of the device, build an automated model which forecasts the device failure helping quick issue detection and resolution.

Data Room Activity:

  • Aim of this project to extract the data (eg. credit score, utility bill, quarterly transaction count, quarterly transactional volume, etc.) for Bajaj finance on common users base.
  • Optimized spark configuration to process huge volume of historical data on EMR (Elastic MapReduce).
  • Wrote the optimal pyspark script to reduce the execution time as well as reduce the Elastic MapReduce cost(AWS).

PII-Eye:

  • PII Model detect the Personal Identifiable Information (PII) from the structured data i.e. data present in the tabular format.
  • Identify the personal information from the data, and if PII probability is greater than the threshold value T, then declare that column as PII column.
  • Optimized character-level convocational networks (Convects) for text classification.
  • Character-level convocational networks could achieve state-of-the-art or competitive results.

Machine Learning Engineer

Emilence Pvt. Ltd.
08.2020 - 02.2021

User Recommendation System:

  • Recommendation Engine was developed and deployed for the Client in Amazon Web Services (AWS), Suggest the similar user interest.
  • Hybrid User-based collaborative filtering was developed which helped in identifying the best suited users.
  • Recommend the user based on the mutual user implemented using the Network theory.
  • Accurate Recommendations so perform well in A/B testing and increase 60% Engagement rate of users.
  • Designed, implemented and evaluated new models and rapid software prototypes to solve problems in machine learning and systems engineering.


Education

PG Diploma - Data Science and Artificial Intelligence

Indraprastha Institute of Information Technology
New Delhi, India
10.2021

Master of Computer and Applications - Computer Applications

Panjab University
Chandigarh, India
05.2020

Bachelor of Science - Mathematics

Aryabhatta College, University Of Delhi
New Delhi, India
05.2016

Skills

  • Machine Learning: Classical ML, Regression, Classification,Bagging, Boosting, Time Series, Statistical modeling, Predictive modeling

  • Deep Learning : NLP and GenAI: RNN, LSTM, GRU ,BERT, Transformers,Generative AI, LLM ,Rag System, Neural networks

  • Big data analytics: Pyspark, Hadoop , Hive , BigQuery

  • NLP: Fasttext, Language Modelling, Spacy, Bert

Accomplishments

  • Silver Medals in Programming in Java from National Program on Technology Enhanced Learning NPTEL 2019.
  • Demonstrated model Mathematics in everyday life in the state level Jawaharlal Nehru Children’s Science Exhibition-2009.
  • First Prize in Mental Maths at Zonal level.

Timeline

Senior Data Scientist

Paytm
11.2021 - Current

Machine Learning Engineer

Emilence Pvt. Ltd.
08.2020 - 02.2021

PG Diploma - Data Science and Artificial Intelligence

Indraprastha Institute of Information Technology

Master of Computer and Applications - Computer Applications

Panjab University

Bachelor of Science - Mathematics

Aryabhatta College, University Of Delhi
Dileep Kumar SahuSenior Data Scientist