Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sandeep Kumar

Bangalore

Summary

● Data Scientist and Team Leader with 12 years of comprehensive
experience in product and service-based industries, showcasing
expertise in both individual contributions and leadership roles across
diverse industry domains such as Banking, Pharma, Cyber Security,
and Postal sectors
● Proven track record of applying statistical methods, conducting
exploratory data analysis, and implementing predictive analytics
strategies. Skilled in team management, leading cross-functional
teams, and driving collaborative efforts to achieve project goals.
● Demonstrated success in delivering actionable solution through
exploratory data analysis, linear regression, classification ,clustering ,
Natural Language Processing .
● Delivered meaningful business insight and analytical solutions
through text summary , similarity analysis and sentiment analysis for
the cybersecurity domain
● Familiar with deep learning concepts, including CNNs and RNNs, and
have contributed to the development of deep learning models for
image recognition.
● Proven expertise in leveraging AWS cloud ecosystems, including S3,
Athena, EC2, and EMR, to drive data management and analytics
initiatives.
● Extensive experience with GCP cloud environments, particularly in
leveraging BigQuery ,GCS , GCP Cloud composer for data analysis and
management.
● Proven track record in managing large, complex datasets and data
warehouses, with a strong ability to lead data extraction and analysis
efforts using advanced programming techniques.
● Expertise in developing and optimizing SQL queries across diverse
environments and databases, including AWS Redshift, AWS Athena,
GCP BigQuery, and Teradata, to facilitate strategic data-driven
decision-making.
● Strong expertise in utilizing Python and its data analysis libraries,
including Pandas and NumPy, to drive impactful data-driven solutions
and enhance analytical capabilities within teams.
● Proven experience in designing and implementing data pipelines
using Airflow to efficiently extract data from multiple sources,
combined with a strong understanding of continuous integration
practices and tools such as GitHub and Jenkins.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Tech Specialist

Tech Mahindra
09.2024 - Current

● Participated in IBM Watsonx Technical Enablement Bootcamp,
gaining hands-on experience with Watsonx tools for building and
deploying AI/ML solutions.
● Attended advanced training on Generative AI, focusing on
applications of large language models (LLMs), ethical AI
implementation, and real-world deployment strategies.
● Explored cutting-edge AI frameworks and solutions to enhance
expertise in designing scalable and impactful AI-driven products.

Data Scientist

Lookout
08.2019 - 09.2024
  • Developed sophisticated models leveraging statistical analysis and machine learning techniques.
  • Led projects focused on Logistic regression , Test Similarity, Summarization, and Sentiment Analysis.
  • Reviewed business domains to bolster analytics, enabling informed decision-making backed by robust data.
  • Gathered analytics data from diverse sources, ensuring comprehensive insights.
  • Presented insights through compelling data visualizations, utilizing tools such as Tableau and Jupyter Notebook.


Project Title : Generative AI Research Initiative

Role: Lead and Contributor

Technologies: GCP Vertex AI , Python

The project was a generative AI initiative involving multiple teams

focused on developing various components, including a prompt

engineering engine and advanced chat capabilities .My primary

responsibility was to lead the implementation of the document

processing and retrieval system enhance user interaction with the

documentation.

As the lead, I oversaw the development and was involved in key tasks

such as:

- Extracting text from the PDF documents.

- Generating embeddings for the extracted text using the GCP Vertex AI model.

- Generating responses based on the stored PDF content using the GCP

Vertex AI


Project Title: App Review Analysis

Role: Lead and Contributor

Technologies: AWS , BERT, Python, Pandas

The project aimed to classify user sentiments from app reviews on the

App Store and Play Store, addressing the need for automated sentiment

analysis to inform product enhancements and user engagement

strategies.

-Led the development of a sentiment analysis solution to classify user

sentiments from App Store and Play Store reviews.

- Worked on AWS environment, prepared datasets.

notebook instance.

- Developed a sentiment analysis pipeline utilizing BERT and used

pandas for data preprocessing and analysis.

- Generated insights from user feedback, resulting in improved app

ratings and enhanced user satisfaction through targeted product

enhancements.

The project resulted in a structured dataset with sentiment labels and

scores, enabling data-driven insights into user feedback. This facilitated

targeted app improvements, enhanced user satisfaction, and improved

overall app ratings.


Project Title: Text Similarity Analysis
Role: Lead and Contributor
Technologies: AWS, Python, Pandas
Spearheaded Natural Language Processing initiatives, particularly in
Text Similarity Analysis.
Employed advanced tokenization methods to discern similarities
between textual descriptions.
Leveraged libraries including NLTK and scikit-learn to refine analysis.


Project Title: GCP Airflow Pipeline Monitoring System
Role: Lead and Contributor
Technologies: GCP Cloud Composer, GCP BigQuery ,Airflow, Python,
Jenkins
The project aimed to develop a system for monitoring various Airflow
pipeline jobs using the Airflow database. This initiative was essential for
ensuring the reliability and efficiency of ETL processes within the
organization on GCP environment
- Extracted data from the GCP Airflow database to monitor job
performance and status.
- Processed the data through an ETL workflow using Airflow.
- Saved the processed data back to Google cloud for analysis and used it
using BigQuery for reporting.
- Set up and managed Jenkins for continuous integration and
deployment of monitoring scripts.
The project provided real-time insights into Airflow pipeline jobs,
enabling better oversight and quicker resolution of issues. This
monitoring capability enhanced the reliability of data workflows,
contributing to improved operational efficiency across data processing
tasks.


Project Title: App Risk Score Exploration and classification
Role: Lead and Contributor
Technologies: Python, Pandas, AWS, AWS Boto3
The project aimed to classify app based on risk level
- Conducted thorough Exploratory Data Analysis (EDA) to extract
actionable insights.
- Applied advanced statistical analysis and logistic regression models to
classify apps based on risk levels.
- Correlation Analysis
- Conducted statistical analysis to unveil correlations between specific
operating system versions and associated risks.
- Utilized Tableau for dynamic visualization, enhancing comprehension
of complex relationships.


Data Analyst and Consultant

Capgemini Engineering
12.2018 - 08.2019

● Extracting data from AWS S3 and AWS Athena

● Writing code on python using GitHub as code repository and Gerrit as code review tool.

● Analysing and performing exploratory Data Analysis available in AWS cloud using python Pandas, Jupyter lab etc.

● Creating report on Tableau for few of the statistical analysis .

Production Triage report Analysis

· Led and worked along with the team on Data analysis & Visualization project where the dashboard was created for all the recent failure for the job running in airflow. This involved airflow database, creating airflow Dag , AWS S3 , Tableau.

· Extracting and processing data from AWS Athena and AWS S3

· Performing Statistical Data Analysis available in AWS cloud using python Pandas, Jupyter lab etc.,

Team lead- Data Analyst & Test Engineer

Accenture
07.2012 - 12.2018

● Coordinating with stakeholder/Business Owner for test management, Requirement clarification, UAT sign off.

● ETL Test Automation Through SQL and Python Pandas

● Creation of Master test plan, Test Reports, traceability matrix.

● Test case creation and Execution in JIRA.

● Data validation for data sourced from different systems like oracle, Wall Street etc. and different formats like XML, CSV, MQ etc.

● Writing SQL queries, working on Unix Environment, data stage environment.

● Testing banking Products like Calypso, Covered Bond, Wall Street etc.

Education

M. Tech - Data Science and Engineering

BITS Pilani WLIP

Masters of Computer Application -

SRM University

Bachelor of Computer Application -

Birla Institute of Technology

Skills

  • Machine Learning: Pandas, sklearn , Linear regression, Logistic Regression , classification , Clustering
  • NLP and Deep Learning : NLTK ,Text Similarity , Sentiment Analysis , Tensorflow
  • Generative AI - IBM Watsonx , Vertex AI
  • MLOPS & Cloud: AWS, GCP, Git , Jenkins , Docker
  • Tools: AWS S3 , Athena ,GCP Big Query, Teradata, SQL Developer , Jupyter Lab

Certification

  • "Business through AI-Powered Supply Chains" from IIM Mumbai .
  • IBM Data Science Professional Certificate from IBM .

Timeline

Tech Specialist

Tech Mahindra
09.2024 - Current

Data Scientist

Lookout
08.2019 - 09.2024

Data Analyst and Consultant

Capgemini Engineering
12.2018 - 08.2019

Team lead- Data Analyst & Test Engineer

Accenture
07.2012 - 12.2018

M. Tech - Data Science and Engineering

BITS Pilani WLIP

Masters of Computer Application -

SRM University

Bachelor of Computer Application -

Birla Institute of Technology
Sandeep Kumar