Summary
Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

VASU JEYAKUMAR

Chennai

Summary

Experienced Data Retrieval Analyst specializing in sports event data, proficient in utilizing AI software for efficient data gathering. Seeking to leverage data analysis, machine learning, and statistical modeling skills to transition into a Data Scientist role. Possesses strong expertise in data manipulation, cleaning, and visualization. Eager to apply analytical insights and predictive modeling techniques to drive impactful decisions in a data-driven environment.

Overview

2
2
years of professional experience
1
1
Certification

Work History

Data Collection Analyst

Stats Perform
Chennai
01.2023 - Current
  • I retrieved and processed Opta data in real time using a combination of human annotation, computer vision, and AI modeling techniques
  • Utilized Excel extensively for data management, including data entry, analysis, and visualization
  • Supported the transformation of Opta data into sports content, analysis, and insights used by leading teams, broadcasters, media, apps, bookmakers, and brands

Education

Master - data science

GUVI Geek Network Private Limited
01.2024

Bachelor of engineering - computer science

Kit And Kim Engineering College, Anna University
Chennai
01.2023

Skills

  • Python
  • Pandas
  • Numpy
  • Scikit-learn
  • Matplotlib
  • Saeborn
  • MYSQL
  • Excel

Certification

  • National Skill Development Certification
  • ChatGpt course Certified by Guvi
  • Guvi Certified Data Scientist

Projects



Redbus Data Scraping with Selenium & Dynamic Filtering Using Streamlit

  • Project Overview: Automated data extraction from the Redbus website using Selenium, collecting comprehensive details on bus routes, schedules, prices, and seat availability. Built an interactive Streamlit app for users to filter data based on key criteria such as price range, bus type, and ratings.
  • Key Achievements:Scraped data for both government and private buses, achieving over 95% data accuracy.
    Created an intuitive and responsive Streamlit application for real-time data filtering, improving operational efficiency in bus travel decisions.
  • Tools: Selenium, Python, SQL, Streamlit, Pandas, NumPy
  • GitHub: Redbus Data Scraping Project

DataSpark: Illuminating Insights for Global Electronics

  • Project Overview: Conducted data cleaning and analysis for Global Electronics to optimize operations, improve customer satisfaction, and drive growth. Created interactive Power BI dashboards to visualize key insights.
  • Key Achievements:Cleaned and prepared large datasets, improving data quality by handling missing values and integrating multiple data sources.
    Built SQL queries and Power BI dashboards to uncover customer segmentation insights, product performance, and sales trends.
  • Tools: Python, SQL, Power BI, Pandas
  • Key Insights:Enhanced marketing strategies based on customer segmentation, resulting in a 10% increase in customer retention.
    Optimized inventory planning using data-driven sales patterns.

Cybersecurity Incident Triage Prediction

  • Project Overview: Developed a machine learning model to automate cybersecurity incident triage, classifying incidents as True Positive (TP), Benign Positive (BP), or False Positive (FP).
  • Key Achievements:Utilized Random Forest and XGBoost models, achieving an accuracy of 92%, improving SOC efficiency by reducing manual triage workload.
    Applied feature engineering techniques such as Chi-Square tests and MCA to improve model performance.
  • Tools: Python, Pandas, Scikit-learn, XGBoost, MLflow
  • Model Performance: Precision = 0.92, Recall = 0.92, F1-Score = 0.92

Car Price Prediction using Gradient Boosting Algorithm

  • Project Overview: Built a Gradient Boosting model to predict car prices based on features like mileage, make, model, and engine size. Deployed the model as a Streamlit web application.
  • Key Achievements:Achieved an R² score of 0.91, providing accurate price predictions and supporting decision-making in the automotive market.
    Deployed the model on Streamlit, enabling real-time price estimation for users.
  • Tools: Python, GradientBoostingRegressor, Streamlit, Pandas

Sentiment Analysis on Women’s Clothing E-Commerce Reviews

  • Project Overview: Developed an LSTM-based sentiment analysis model to classify customer reviews as positive or negative, helping businesses understand customer sentiment better.
  • Key Achievements:Fine-tuned a pre-trained RoBERTa model, achieving high accuracy in classifying sentiment, and deployed the model using FastAPI and Docker on AWS.
    Extracted key topics from reviews using BERTopic, providing actionable insights into customer preferences.
  • Tools: Python, LSTM, Keras, RoBERTa, FastAPI, Docker, AWS

Timeline

Data Collection Analyst

Stats Perform
01.2023 - Current

Master - data science

GUVI Geek Network Private Limited

Bachelor of engineering - computer science

Kit And Kim Engineering College, Anna University
VASU JEYAKUMAR