Summary

Work History

Education

Skills

Certification

Timeline

Sunny Venkatesh

Data Scientist | Machine learning engineer

Bangalore

Summary

Motivated data scientist with a dynamic background, encompassing roles as a data scientist and engineer. Passionate about leveraging machine learning to craft impactful models that effectively address real-world problems.

Work History

Data Scientist

Cognimuse

Bangalore

03.2023 - Current

Data Science and Machine Learning:

Conducted comprehensive Exploratory Data Analysis (EDA), employing advanced visualization techniques to unveil patterns crucial for informed decision-making.
Spearheaded initiatives in descriptive analysis and data cleaning, ensuring data quality to achieve optimal model performance.
Developed and executed A/B testing strategies, contributing to evidence-based decision-making and streamlining business processes.
Demonstrated practical proficiency in a diverse array of machine learning (ML) algorithms, encompassing XGBoost, Linear Regression, KNN, and SVMs, effectively addressing real-world challenges.
Applied advanced ML techniques that includes: feature selection methods (filter, wrapper methods) , hyperparameter tuning and regularization(l1,l2) and ensemble methods, enhancing model robustness and performance.

Natural Language Processing (NLP) and Deep Learning:

Leveraged NLP techniques, including Named Entity Recognition, TF-IDF and implemented advanced word, sentence embeddings using 'Word2Vec', 'GloVe', 'universal sentence encoder (USE)' algorithms for comprehensive similarity analysis.
Applied deep learning architectures (LSTM, RNNs, and GRUs) with both sentence and word embeddings for advanced sentiment analysis, document clustering, and similarity analysis across diverse text datasets.
Utilized Language Model Models (LLMs) such as DistilBERT for text classification, enhancing capabilities in NLP tasks. Implemented Whisper AI for cutting-edge audio-to-text transcription, expanding expertise and applications in AI.

Product work:

Assisted in constructing an automated ML pipeline for the 'AI diet assistant product,' spanning ETL processes, data cleaning, Train-Test-Validation Split, EDA, Feature Engineering ,Model Selection and Implementation, Model Evaluation, Hyperparameter Tuning, and Model Validation.

Technologies used: Python, scikit-learn, XGBoost, TensorFlow, Keras, NLTK, spaCy, Gensim, Word2Vec, GloVe, Universal Sentence Encoder (USE), Whisper AI, Hugging Face Transformers, Pandas, NumPy, Matplotlib, Seaborn

Data Engineer

Saturam Infosystems

Bangalore

01.2022 - 11.2022

Developed custom Apache Airflow "DAGs(directed acyclic graph)" to automate workflow sequences, which involved performing basic "REST API" operations from a given source and populating the data received for GET requests into a PostgreSQL table.
Conceptualized and created a "CSV file validator" feature to detect the validity of "CSV" files according to "RFC-4180" file specifications. The feature was deployed into a "Databricks" compute cluster as a compute job.
Designed and developed a scalable ETL pipeline using Apache Airflow to ingest, transform, and load a high volume of daily transactional data from various sources into a centralized data warehouse. The pipeline utilized technologies such as Apache Spark, Pandas, and SQL for data transformation and cleansing, and incorporated data quality checks and error handling mechanisms to ensure data accuracy and completeness.

Technologies used: Python, Apache Airflow, Flask, PostgreSQL, PySpark, clevercsv, pyspark, databricks

NLP Intern

Concerto AI

Bangalore

06.2019 - 09.2019

College Internship:

Developed flask application using python for creating a REST API that integrated with an SQL database. Specifically, used the Flask framework to create HTTP endpoints to handle GET and POST requests from clients. I also implemented a database using SQLite and integrated it with the Flask application to store and retrieve data.
Developed Python programs to measure the similarity of sentences constructed in Hindi and English by utilizing word embeddings and cosine-similarity techniques. Utilized pre-trained word embeddings such as Word2Vec and 'GloVe' to convert words to numerical vectors, and then used the cosine-similarity technique to measure the similarity between sentence pairs. I implemented this using the NumPy library and created a Python module for it.
Contributed to the company's codebase by adding the above-mentioned feature via a GitHub pull request, which followed the company's software development process. I also wrote documentation and unit tests to ensure the code's quality and maintainability.
Received an S+ grade (90-100%) for the internship.

Technologies used: python, word2vec, gensim, nltk, flask, tensorflow, keras , pandas, numpy

Education

Bachelor of Engineering - Computer Science

BMS Institute of Technology

Bangalore

2016 - 2020

Skills

Data Science: Data Visualization, statistical Analysis, descriptive and inferential analysis, A/B Testing

Machine Learning: Supervised, Unsupervised Learning, Hyperparameter tuning and regularization, ensemble methods

Deep Learning: Deep neural networks, LSTM,GRU, model optimization

Natural Language processing: Text preprocessing, Language parsing, Language Quantification(tf-idf,embeddings), text generation

Feature Engineering: Data Preprocessing, Feature Scaling, Dimensionality Reduction

Software Engineering: Python, git, linux, shell scripting, sql

Certification

Codecademy career path: Machine Learning/AI Engineer

Timeline

Data Scientist

Cognimuse

03.2023 - Current

Data Engineer

Saturam Infosystems

01.2022 - 11.2022

NLP Intern

Concerto AI

06.2019 - 09.2019

Bachelor of Engineering - Computer Science

BMS Institute of Technology

2016 - 2020