- Language - Python, SQL
Project 1:
Child Development plan - (Sep 2020 to August 2021)
1. It is an investment product where a customer (father/mother) can invest into this account and get the benefits for their children education expenses.
2. Business wanted to have more number of customers to take this plan as the current approach based on business understanding is not yielding good conversions.
3. The target variable is the customers who opened the account within 45 days when targeted. The realized event rate is 4%.
4. Logistic regression is used to develop the solution.
5. Data sources such as customer demographics, giro transactions, CASA balances, Investment and insurance are used in the model.
6. Data is split into 70:30 for training and testing respectively.
7. Various data treatment (missing, outlier,) has been carried out to clean the data and transform the variables to make the data trended. 8. Variables are selected based on Information value, Variable clustering and business importance. Checked the multi-collinearity issue using VIF. Variable significance, variable sign in the regression equation and computed the performance statistics and model stability. 9. The final model was able to predict 70% of the total events in the modeling dataset using the top 20% data.
10.The business users approve model after the test implementation of the model on actual data .
Project 2:
Retail Analysis
Objective – One of the leading retail stores, would like to predict the sales and demand accurately. There are certain events and holidays, which influence sales on each day. There are sales data available for 45 stores. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine-learning algorithm. An ideal ML algorithm will predict demand at different points of time covering seasonality and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Resposibilities -
1. Perform EDA
2. Treating missing values and outliers.
3. Which store has maximum sales.
4. Analysis -
a. Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
b. Which store/s has good quarterly growth rate.
c. Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in nonholiday season for all stores together.
5. Linear Regression – Utilize variables like date and restructure dates(starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
6. Time series forecasting model –
a. Hypothesize if the data is fit for time series analysis – check for white noise probability test
b. Make adjustments in historical data for events like holidays, if applicable
7. Predict next 6 months. Check for MAPE.