Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Personal Information
Languages
Timeline
Generic
Sunil Kumar Bhagat

Sunil Kumar Bhagat

Kolkata

Summary

Experienced data scientist and data engineer both with a strong background in analyzing large datasets to provide actionable insights for complex business challenges. Skilled in distribution, predictive, and hypothetical modeling, with a proven ability to enhance company operations through data-driven solutions. Specializes in creating efficient data pipelines and automations, leveraging both big data frameworks and custom solutions to streamline processes across on-premises and Cloud architectures. Committed to tackling real-time data issues using a diverse range of advanced data science techniques.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Scientist & Big Data Engineer

Rackspace Technology
REMOTE
08.2022 - Current
  • Developed optimized algorithms to maximize profit based on multiple constraints using Python, resulting in significant business value.
  • Led projects utilizing Large Language Models (LLMs) for text generation, summarization, classification, understanding/analysis, and grammar correction.
  • Classified news alerts into four categories through NLP and sentiment analysis techniques, improving real-time information processing.
  • Created custom functions to derive billing status and transaction amounts from business columns, enhancing financial accuracy.
  • Built regression models to predict future traffic volume and classify Air Quality Index, aiding in environmental and urban planning.
  • Applied advanced mathematical concepts in Python for solving complex problems, demonstrating strong analytical skills.
  • Visualized oil and gas industry data using Tableau and Python libraries, providing actionable insights for stakeholders.
  • Predicted loan subscriptions and classifications based on sentiment analysis of Amazon reviews; real/fake news detection using NLP techniques.
  • Generated Probabilistic Context-Free Grammar, parse trees, and POS tagging for enhanced language processing tasks.
  • Designed an Excel-based VBA tool interfacing with Refinitiv API to extract, analyze, and summarize data; incorporated NLP for relevance scoring and automated summary snippets.
  • Identified high-earning customer movement patterns for targeted marketing strategies, boosting campaign effectiveness.
  • Detected deepfake videos using computer vision, CNN, and LSTM models on GPU architecture, enhancing digital security measures.
  • Generated generic supplier names through string manipulation, streamlining supplier database management.
  • Utilized Levenshtein distance and fuzzy matching for accurate string comparisons, improving data quality.
  • Preprocessed large volumes of structured and unstructured data from various sources, conducting EDA, data cleansing, munging, and feature engineering.
  • Created interactive visualizations and dashboards using Tableau, Power BI, and AWS QuickSight for data-driven decision-making.
  • Assisted in developing and refining machine learning models for predictive analytics tasks, enhancing model accuracy and performance.
  • Engineered features based on domain knowledge to improve model performance, employing techniques like Linear & Logistic Regression, Clustering, Decision Trees, Random Forests, Gradient Boosting, and Ensemble methods.
  • Utilized deep learning techniques for text, image, and video processing, leveraging frameworks such as TensorFlow, Keras, Scikit-learn, SciPy, Spacy, NLTK, PyTorch, Matplotlib, Pandas, NumPy, and Seaborn.
  • Performed hyperparameter tuning to optimize machine learning and deep learning models for improved performance, scalability, interpretability, and accuracy.
  • Worked on US-based Hotels & Resorts project and Singapore-based import-export trade project, managing diverse data needs and requirements.
  • Created both on-premises and cloud-based ETL (Extract, Transform, Load) and Data Quality (DQ) jobs, ensuring high data integrity and reliability.
  • Developed and maintained data pipelines to support ETL processes, enhancing data quality and operational efficiency.
  • Migrated data from Enterprise Data Warehouse and Enterprise Data Mart to AWS S3, leveraging AWS Cloud services for improved data storage and accessibility.
  • Developed automation scripts using Python and Selenium to traverse trade data websites, download buyer and supplier data, and merge them into a single dataset.
  • Implemented failure handling mechanisms for production jobs, with real-time failure notifications and standardization of production workflows.
  • Conducted web scraping projects using Python (Beautiful Soup) to fetch data from open-source websites, including retrieving SWIFT codes for banks worldwide, and setting up automated alert mechanisms.
  • Designed and built scalable and robust infrastructure for data storage, processing, and analysis using technologies like Apache Hadoop, Spark, and AWS cloud solutions.
  • Actively preparing for AWS, GCP, and Azure certifications to enhance cloud expertise and capabilities.

Data Engineer

TATA CONSULTANCY SERVICES
KOLKATA
07.2019 - 08.2022
  • Led a telecommunication project for a major US-based telecommunication and media company, managing data engineering and technical aspects.
  • Developed Change Data Capture and Slowly Changing Dimension handling frameworks, along with custom data ingestion frameworks using Spark, optimizing performance for big data file formats.
  • Created Data Quality check frameworks using Spark and cloud technologies, ensuring high data integrity.
  • Built an end-to-end real-time centralized monitoring system for daily, weekly, and monthly job tracking using UNIX, Python, AWS, and AWS QuickSight. Developed dashboards to monitor job status (running, failed, completed, pending) and long-running jobs, with real-time failure notifications via email.
  • Demonstrated technical proficiency in Big Data Hadoop tools and Spark framework using Scala and Python, with over 3 years of experience on UNIX and Windows OS.
  • Served as Technical Lead, managing microservices-based technical platforms using Kubernetes-Docker with MinIO/HDFS storage. Addressed software requirements, resolved code/technical blockers, and architected new team platforms and cluster configurations.
  • Expert in creating automation frameworks using AI/ML packages, integrating multiple technologies for use cases such as automatic bug fixes, root cause analysis, failure notifications, data quality checks, and Scrum/Kanban board automation.
  • Developed custom ETL processes and data ingestion tools, integrating them with enterprise or pre-built tools for seamless data operations.
  • Conducted daily meetings with various teams (Client, QC, Business) to ensure requirements are met and delivery is smooth, adhering to Agile Development, Test-Driven Development, and Daily Scrum methodologies.
  • Provided knowledge training to team members on big data tools and technologies, fostering a culture of continuous learning and development.

Education

Master of Technology - Data Science & Engineering

Birla Institute of Technology And Science
PILANI
03.2024

Bachelor of Technology - Computer Science & Engineering

Dr. Sudhir Chandra Sur Degree Engineering College
KOLKATA, WEST BENGAL
07.2019

(10+2) th - Science

Silver Point School
KOLKATA, WEST BENGAL
06.2015

(10) th - General

Silver Point School
KOLKATA, WEST BENGAL
05.2013

Skills

  • Data Science and Data Engineering
  • Python, Scala, Unix Shell Scripting , Java, C
  • Exploratory data analysis , Feature engineering , Text understanding/analysis , Sentiment analysis , Custom functions and models
  • LLM, Generative AI, Machine Learning, Deep Learning, Ensemble Learning, Data Mining, Statistics, NLP, Data Analysis
    TensorFlow, Keras, Scikit learn, SciPy, Spacy, NLTK, PyTorch, Matplotlib, Pandas, NumPy, Seaborn
  • AWS (Sagemaker, Glue, s3, quicksight, lambda, Kinesis, RDS, CloudWatch, etc)
    GCP (Vertex AI, cloud storage, databases etc)
  • Azure (AI Services)
  • Data Visualization using AWS Quicksight, Power BI and Tableau
  • SQL/HQL query language
  • Big Data Hadoop, HDFS, hive, Sqoop, Kafka elk, data pipeline
  • Spark Framework, Data Warehousing and ETL
  • Kubernetes and Docker (Microservices)
  • Kibana, Elasticsearch, MinIO (S3 storage service)
  • GitHub, JIRA, Kanban
  • Data Structure and Algorithm
  • Database Management System and OOPS
  • Visual Studio, UC4, Putty, DBeaver, IntelliJ, Anaconda, SQL server Management Studio, Jupyter Notebook, PyCharm, Microsoft Office, Adobe Photoshop

Certification

  • AWS Certified Machine Learning - Specialty
  • GCP Professional Machine Learning Engineer
  • GCP Professional Data Engineer
  • Azure AI 900 AI Fundamentals
  • Azure AI 102 AI Engineer
  • Azure DP 100 Data Scientist
  • Artificial Intelligence AI-Ready, AI-Business and AI-Specialist certification from Rackspace Technology
  • Java certification from Microsoft
  • Python using Data Structures and Algorithm from NPTEL
  • Training on C Programming under for 3 months in the year 2013
  • Training on Computer Hardware Technology under (P.M.K.V.Y) N.S.D.C. for 6 months in the year 2016

Accomplishments

  • Created projects on advanced ML and DL for Regression, classification, clustering and using NLP in BITS (M. Tech Data Science and Engineering)
  • Secured second position in intra college project competition (IMPULSE) and displayed in ITC Sonar for IOT based Automatic Medicine Dispenser
  • Secured first rank in technical coding quiz competition in college
  • Summer Training Internship on Big Data Hadoop under Ardent Computech Pvt. Ltd. for 1 month in the year 2018
  • Training on Basic Android App Development under MTA Education for 2 days in the year 2018
  • Project on “Grievance Redressal Management System” using Java+DBMS under Infosys Training in the year 2017
  • Project on “Tours and Travels (Tour-o-Tica)” using Visual Studio and MS access under college in the year 2016

Personal Information

Date of Birth: 05/30/1997

Languages

  • Hindi
  • English
  • Bengali
  • Bhojpuri

Timeline

Data Scientist & Big Data Engineer

Rackspace Technology
08.2022 - Current

Data Engineer

TATA CONSULTANCY SERVICES
07.2019 - 08.2022

Master of Technology - Data Science & Engineering

Birla Institute of Technology And Science

Bachelor of Technology - Computer Science & Engineering

Dr. Sudhir Chandra Sur Degree Engineering College

(10+2) th - Science

Silver Point School

(10) th - General

Silver Point School
Sunil Kumar Bhagat