Results-driven professional with experience in designing and optimizing scalable data pipelines, developing data-driven solutions, and building robust machine learning models. Proficient in Python, SQL, and big data technologies such as Spark, Hive, and Hadoop for data processing, integration, and analysis. Expertise in exploratory data analysis (EDA), predictive modeling, and end-to-end machine learning pipeline development, transforming large datasets into actionable insights. Skilled in deploying machine learning models, leveraging data to solve complex business problems, and collaborating with cross-functional teams to drive data-driven decision-making.
Overview
1
1
Certification
Work History
Data Scientist Intern
ANWEB TECHNOLOGIES PVT LTD
09.2024 - 01.2025
The objective of the project was to calculate Expected Credit Losses (ECL) in compliance with IFRS 9 standards, perform vintage analysis, and drill down to loan-level data to analyze risk factors and trends.
Data Integration: Worked with three main data stores-Model_collateral, Model_config, and Model_Authrep to consolidate loan-level data, credit risk parameters, and loan opening information.
Risk Metric Calculations: Developed and implemented calculations for PD, LGD, and EAD, and ensured they adhered to IFRS 9 guidelines.
Stage Classification: Identified and categorized loans into three stages: Stage 1: Loans with low credit risk or no significant increase in risk. Stage 2: Loans with a significant increase in credit risk but not yet impaired. Stage 3: Credit-impaired loans.
Vintage Analysis: Conducted vintage analysis to track the performance of loan cohorts over time and identify trends and risk factors.
Detailed Loan-Level Analysis: Performed in-depth analysis to identify specific risk drivers impacting ECL calculations.
Risk Reporting: Generated regular reports on key risk metrics for internal and external stakeholders.
Implemented and optimized data for efficient data processing and transformation.
Utilized Pandas for querying and managing large datasets, ensuring high performance and reliability.
Collaborated with business teams to understand requirements and provide data-driven solutions, including the addition of new attributes to datasets.
Applied machine learning techniques to calculate key risk metrics, including Probability of Default (PD), Loss Given Default (LGD), and Exposure at Default (EAD), within a three-stage classification framework.
Utilized Python and machine learning to predict shipment pricing, performing data exploration, cleaning, and feature engineering.
Applied statistical analysis and algorithms like Linear Regression and Random Forest, achieving an R-squared score of 0.85.
Built and optimized predictive models, creating visualizations and reports for stakeholders, enabling data-driven decisions that led to cost reductions in supply chain operations.
Developed a Flask-based web application to deploy a shipment pricing prediction model.
End-to-End ML Pipeline: Designed and implemented the complete machine learning pipeline, including data preprocessing, feature engineering, model training, evaluation, and optimization for accuracy and efficiency.