Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Kaushik Burra

Kaushik Burra

Hyderabad

Summary

Experienced Data Scientist with nearly 4 years in AI, specializing in efficient data pipelines and machine learning solutions. Proficient in Python, PySpark, SQL, and Airflow, with expertise in integrating REST APIs and designing ML pipelines for robust deployment. Skilled in experiment tracking using Weights & Biases (W&B).

Focused on advanced Generative AI (GenAI) applications, including Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) for financial document QA retrieval and table parsing.

Experienced in training deep learning models with PyTorch and developing ensemble models with hyperparameter tuning using Optuna and Hyperopt.

Strong background in MLOps and FEOps, committed to delivering innovative solutions in both structured and unstructured data domains.

Overview

2025
2025
years of professional experience
1
1
Certification

Work History

Data Scientist

Quadratic Insights Pvt. Ltd.
07.2021 - Current

Financial Prediction Service (Sep 2023 – Nov 2024)

  • Objective: Developed an AI solution to detect "bad" SSIs using historical trade data as ground truth.
  • Data Engineering: Engineered pipelines for labeled SSI data, preprocessing and transforming raw data into scalable, high-value features for ML models.
  • Model Development: Implemented Py Torch deep learning models, Multinomial Naive Bayes, and XGBoost; created ensemble models combining these approaches.
  • Experiment Tracking: Utilized Weights & Biases for comprehensive model tracking and analysis.
  • Deployment: Built prediction REST services, regression-tested, and deployed to QA/Production using CI/CD.
  • Impact: Improved proactive SSI detection, streamlining trade processes.
  • Tools: PyTorch Lightning, Python, Airflow, SQL, Git

QIPP Product (June 2022 – Aug 2023)

  • Goal: Developed an end-to-end SaaS tool for cost optimization and order scheduling for manufacturing.
  • Modeling: Applied statistical and time series analysis for demand and price forecasting; designed linear programming models for order optimization.
  • Techniques: Implemented Naive Bayes and clustering for supplier ranking and inventory management.
  • Tools: Spark, Hive, Hadoop, SQL, Airflow, Flask APIs, Azure.

GENERATIVE AI Use Cases

Financial Document Parsing Solution

  • Developed a comprehensive pipeline using multiple parsing tools (LLamaparse, Unstructured, PyMuPDF) to extract tables from financial PDFs.
  • Integrated LLM-generated models, such as GPT-4 Turbo, to enhance data parsing and interpretation.
  • Implemented Retrieval-Augmented Generation (RAG) with ChromaDB for efficient data retrieval and processing.
  • Used retrievers and chains for structured question-answer workflows, leveraging LangSmith to evaluate and refine LLM-generated responses.
  • Enhanced accuracy and efficiency in parsing complex financial data, supporting advanced data extraction and analysis.
    Tools: langchain, llamaindex, Hugging Face, langsmith

COMPUTER VISION USE CASE
Computer Vision Use Case:


Overview: Developed end-to-end solutions for defect detection in manufacturing and enhanced data visualization for flight simulations.

Steel Defect Detection:

Built a pipeline for identifying defects on assembly lines using feature extraction and image processing techniques (e.g., RLE, image masking).
Implemented binary classification of images using deep learning CNN models (ResNet50) and applied image segmentation to classify defects.
Developed a Streamlit application to showcase the entire defect detection pipeline.


3D Graphical Overlay Tool:


Created a tool for plotting data points on 3D graphical images in a flight simulation domain.
Employed computer vision preprocessing techniques (line detection, axis detection) and utilized a plot digitizer to extract data points.
Developed an automated overlay tool to visually represent data points for better user understanding.
Technologies Used: Keras, TensorFlow, PyTorch, Python, Streamlit, OpenCV, Matplotlib, Seaborn.

ROC- Transportation Ops Specialist

Amazon Development India Pvt Ltd.
2019 - 03.202

• From Fulfilment-to-Fulfilment Centres, scheduled Ad hoc for the nominal trucks and cancelled the ones that are not required based on predictions and ensured on time delivery to customers.
• Using Tableau and SQL, worked on queries pulling middle mile logistical data.
• Oversaw client communications, managed record tracking and data communication activities.
• Led shipping and delivery quality control by managing downtime resulting in large increase in revenue.
• Oversaw every phase of supply chain, from purchase order to delivery to invoicing, targeting100% end-user satisfaction.

Data Analyst Intern

Pratham India, ORG
07.2016 - 02.2017

•Assisted with research and gained extensive knowledge of Pratham Education focusing on improving the quality of education for underprivileged children across India.
•Furthermore, built a project roadmap, analysed the ASER (Annual Status of Education Report) data over the last one year and analysed factors contributing to low quality in the education System.
•Finally discovered Private School enrolment catching up is huge. Though most children (age 5- 12) continue to get their education from Governments schools, private school enrolment is increasing despite high costs.

Tools used: Excel and Tableau

Education

Post-Degree Certificate - Data Science

International School of Engineering
Hyderabad
01-2021

Master of Professional Studies: Data Science - undefined

Northeastern University -Boston, USA
Boston, MA
08.2019

Master of Technology Integrated: Power Systems -

Sastra University, Tamil Nadu
Tamil Nadu
06.2016

Skills

Technical Skills:

  • Programming & Data Processing: Proficient in Python, SQL, PySpark, Pandas, NumPy for scalable data analysis and processing
  • Web Development & APIs: Experienced in building and consuming RESTful APIs with Django, Flask, and FastAPI
  • Machine Learning & AI: Skilled in scikit-learn, PyTorch, TensorFlow, Generative AI (LLMs), and hyperparameter tuning (Optuna, Hyperopt); experiment tracking using Weights & Biases (W&B) and LangSmith
  • Data Engineering & Pipelines: Expertise in Apache Airflow, ETL processes, and distributed computing with Spark and Dask
  • Databases & Cloud Services: Strong with MySQL, PostgreSQL, MongoDB, and cloud tools (AWS S3, Azure)
  • Version Control & CI/CD: Proficient in Git, Bitbucket, GitHub/GitLab for source control and deployment workflows
  • Document Parsing & NLP: Utilized LLamaParse, PyMuPDF, Unstructured, Tabula for extracting structured data from complex documents
  • Software Development: Experience with APIs, version control, and DevOps practices for efficient deployment

Certification

  • Applied Data Science with Python Specialization (Udemy-Michigan University)
  • Introduction to Machine Learning in Production - Coursera
  • The complete SQL Bootcamp- Udemy
  • AWS Fundamentals: Going Cloud Native - Amazon Web Services
  • Introduction to AWS- Coursera
  • Apache Airflow Fundamentals from Astronomer

Timeline

Data Scientist

Quadratic Insights Pvt. Ltd.
07.2021 - Current

Data Analyst Intern

Pratham India, ORG
07.2016 - 02.2017

Master of Professional Studies: Data Science - undefined

Northeastern University -Boston, USA
  • Applied Data Science with Python Specialization (Udemy-Michigan University)
  • Introduction to Machine Learning in Production - Coursera
  • The complete SQL Bootcamp- Udemy
  • AWS Fundamentals: Going Cloud Native - Amazon Web Services
  • Introduction to AWS- Coursera
  • Apache Airflow Fundamentals from Astronomer

ROC- Transportation Ops Specialist

Amazon Development India Pvt Ltd.
2019 - 03.202

Post-Degree Certificate - Data Science

International School of Engineering

Master of Technology Integrated: Power Systems -

Sastra University, Tamil Nadu
Kaushik Burra