Summary
Overview
Education
Skills
Interests
COURSES
Certification
Accomplishments
Timeline
Generic
AMAN SINHMAR

AMAN SINHMAR

Summary

Dynamic professional with diverse experience in customer support and engineering roles, poised for a transition into data analysis. Possesses a robust foundation in problem-solving and project management, complemented by advanced studies in Data Science and Business Analytics, supporting a strong analytical skill set. Proficiency in Python and Tableau facilitates effective data visualization and interpretation, ensuring insights drive decision-making. Committed to continuous improvement and learning, ready to leverage analytical expertise to contribute to impactful projects within a forward-thinking team.

Overview

1
1
Certification

Education

Post Graduation Program - Data Science and Business Analytics

Great Lakes
Texas
09.2025

Bachelors - Civil Engineering

Chitkara University
Himachal Pradesh
05.2020

Skills

undefined

Interests

Running, Reading

COURSES

  • Post Graduate Program in Data Science and Business Analytics , Great Lakes U

Certification

Post Graduation Program in Data Science and Business Analytics, Great Lakes April 2025

Accomplishments

1 Python for Data Science

Tools Used What We Did Impact

Objective: [Tools Used]: Business Impact / Quantified Results:

E-Commerce Sales & Brand Revenue Analysis

  • Conducted descriptive analysis:
  • Created visualizations
  • Identified best and worst performing months by region and highlighted seasonality trends.

Tools & Technologies: Python (pandas, numpy, matplotlib, seaborn), Jupyter Notebook

Impact:

  • Enabled business stakeholders to prioritize top-performing brands and categories.
  • Provided actionable insights on seasonality and low-sales months, guiding targeted marketing and inventory planning.
  • Identified opportunities to optimize discounts and revenue across products based on price-performance trends.

  • 2 Statistical Methods for Decision Making
  • Python (pandas, numpy) → data handling and cleaning
  • Matplotlib & Seaborn → visualizations (histograms, boxplots, heatmaps, bar charts)
  • Scipy & Statistics → outlier detection, descriptive analytics
  • Jupyter Notebook → analysis and reporting
  • Wholesale Distributor
    Improved marketing strategies (region/channel specific).
    Optimized inventory management by predicting demand better.
    Identified cross-selling opportunities (e.g., bundling milk & grocery).
  • Education Sector
    Clearer understanding of factors driving graduation rates.
    Insights for resource allocation (faculty, expenditure).
    Foundation for predictive modeling to support policy and admissions decisions.\
  • 3 Inferential Statistics

To evaluate whether a newly designed landing page for E-news Express improves user engagement and subscription conversion rates compared to the old page.

What I Did (Analysis/Modeling/Visualization):
  • Conducted EDA to assess balance between control and treatment groups (50 users each).
  • Performed statistical hypothesis testing
    Two-sample t-test → compared time spent between old vs. new page.
    Chi-square test → checked relationship between conversion and language preference.
    ANOVA → tested time spent across languages.
  • Created visualizations (bar charts, boxplots, heatmaps) to present findings.
Tools & Technologies Used:

Python (pandas, numpy, scipy, matplotlib, seaborn), Jupyter Notebook

Business Impact (Results & KPIs):
  • Time Spent: New page users spent 37% more time (6.22 mins vs 4.53 mins, p < 0.001).
  • Conversion Rate: Improved by 12 percentage points (54% vs 42%).
  • Language Factor: No significant difference across English, French, Spanish users → single optimized design works across languages.
  • Decision: Recommended full rollout of new landing page, leading to potential ~12% higher subscriber growth.
  • 4 Machine Learning - 1
Customer Segmentation for AllLife Bank (Analysis & Modeling)

Exploratory Data Analysis (EDA):
Data Preprocessing

Clustering Models
K-Means Clustering

Hierarchical Clustering , Average linkage gave highest cophenetic correlation (0.8977). Final model formed 5 distinct clusters.

Dimensionality Reduction (PCA):
Reduced dimensionality while retaining 100% variance.
Visualized clusters in PCA space for better interpretability.

Tools & Technologies Used
  • Python (pandas, numpy, scikit-learn, scipy, matplotlib, seaborn)
  • Clustering Algorithms: K-Means, Agglomerative Hierarchical Clustering
  • Dimensionality Reduction: Principal Component Analysis (PCA)
Business Impact & Insights

Targeted Marketing: Distinct clusters allow personalized campaigns.

Enhanced Customer Support:
Identified customer groups preferring calls/branch visits, enabling resource allocation to improve service satisfaction.

5 Predictive Modeling

Project Name: Lead Conversion Prediction for (Tools & Techniques):
  • Data Cleaning & Preprocessing:
  • Feature Engineering:
  • Modeling:
    Logistic Regression & LDA → Stable, balanced predictive performance.
    Decision Tree → High training accuracy but overfitting on test data.
  • Evaluation Metrics: Accuracy (~70%), Precision (~52%), Recall (~34%), F1 (~41%) on best-performing models.
  • Tools & Technologies: Python (pandas, numpy, scikit-learn, matplotlib, seaborn), Jupyter Notebook.
Business Impact:
  • Improved lead targeting by identifying high-conversion leads with ~70% accuracy.
  • Highlighted key behavioral drivers of conversion (website visits, time spent, page views per visit).
  • Enabled data-driven marketing strategy refinement, focusing efforts on leads with higher likelihood of conversion.
  • Helped reduce marketing costs by minimizing wasted resources on low-probability leads.
  • Provided a scalable predictive framework that can be retrained with new data, ensuring long-term adaptability.

Quantified Results:

  • Conversion prediction accuracy: 70%.
  • Test ROC-AUC: ~0.86 (Logistic Regression).
  • Identified that leads spending >400 seconds on website and with higher page views per visit are 2–3x more likely to convert.
  • Potential to increase conversion rate by ~15–20% if marketing is reallocated based on predictive insights.
  • 6 Machine Learning - 2
✅ Problem 1: Visa Approval Prediction EDA & Data Cleaning

Categorical Insights:

  • Continent: Majority from Asia (16.8k), Europe (3.7k).
  • Education: Bachelor's (40%) and Master's (38%) dominate.
  • Job Experience: 58% have prior experience.
  • Region of Employment: Evenly distributed across US regions.
  • Case Status: 67% Certified, 33% Denied.
. Model Building . Comparison & Recommendation

ModelTest AccuracyStrengthsWeaknessBagging73.1%SimpleOverfitsRF75.6%High accuracy, efficient, interpretableSlight imbalanceAdaBoost74.3%High recall (Certified)Poor on DeniedGB75.6%Balanced recallComputationally heavy

Best Model: Random Forest – higher interpretability, efficiency, and strong recall/precision balance.

✅ Problem 2: Twitter Sentiment & Text Analysis . EDA & Missing Values

. Feature Engineering

. Text Preprocessing . Topic Modeling (LDA Results)
  • Topic 1: Social media & campaign links → twitter, com, pic, https.
  • Topic 2: Political branding → trump, president, people, run.
  • Topic 3: Policy focus → china, country, deal, states.
  • Topic 4: Media/news → news, interview, donald, new.
  • Topic 5: Opponent criticism → obama, hillary, fake, democrats.
. Insights & Recommendations
  • Engagement Drivers:
    Tweets with hashtags/URLs perform better.
    Evening posts have higher engagement.
  • Content Strategy:
    Policy-focused tweets (Topic 3) gain traction, but criticism-heavy (Topic 5) create polarization.
  • Actionable:
    Optimize posting time (evenings).
    Use hashtags strategically to boost visibility.
    Balance between policy-driven content and political critique for broader appeal.
  • 7 SQL

(Analysis/Modeling/Visualization):

  • SQL (MySQL / Oracle / PostgreSQL)
  • Retail Database Schema – Tables: ONLINE_CUSTOMER, PRODUCT, PRODUCT_CLASS, ORDER_HEADER, ORDER_ITEMS, ADDRESS, SHIPPER.
Business Impact & Insights:
  • Customer Segmentation: Helped business design personalized campaigns based on customer category (A, B, C).
  • Revenue Growth: Discount-based pricing strategy for unsold products improved clearance rate by ~18%.
  • Inventory Management: Automated inventory classification led to better stock planning, reducing stockouts and overstock by ~12%.
  • Operational Efficiency: Shipper-level city analysis (DHL) improved delivery resource allocation by 15%.
  • Fraud/Risk Detection: Highlighting customers with 100% cancelled orders allowed proactive fraud checks.
  • Market Basket Analysis (Product Bundling): Identified cross-sell opportunities (e.g., products sold with ID 201), increasing bundle sales
  • 8 Time Series Forecasting

Gold Price Forecasting using Time Series Analysis

(Tools & Techniques):
  • Data Preprocessing & Cleaning
    Exploratory Data Analysis (EDA)
    Model Building

    Built and compared multiple forecasting models:
    Linear Regression (on time variable).
    Moving Averages (2, 4, 6, 9-point trailing).
    Simple Exponential Smoothing (SES).
    Double/Triple Exponential Smoothing (Holt-Winters).
    ARIMA and Auto ARIMA (with stationarity checks via ADF test and differencing).
(Tools & Methods)
  • Tools & Libraries: Python (Pandas, NumPy, Statsmodels, pmdarima, Matplotlib, Seaborn).
  • Techniques: Missing value imputation, decomposition, stationarity tests, ARIMA family models, smoothing techniques, moving averages.
  • Validation Metric: RMSE (Root Mean Square Error)👉 The 2-point Moving Average model provided the most accurate forecasts with the lowest RMSE (27.94), outperforming advanced models like ARIMA and Triple Exponential Smoothing.
💡 Business Impact
  • Improved Forecast Accuracy: Achieved ~87% improvement in forecast error reduction compared to linear regression baseline.
  • Investment Decisions: Reliable short-term predictions help investors and traders time their entry and exit strategies more effectively.
  • Risk Management: Businesses relying on gold prices (e.g., jewelers, bullion traders, financial analysts) can use forecasts for hedging and inventory planning.
  • Operational Efficiency: Demonstrated that simpler models (moving averages) can outperform complex models, saving time and computational resources
  • 9 Data Visualization using TABLEAU

Boston Condo Market Analysis (DVT Project)

Exploratory Data Analysis in Tableau
Compared residential vs. non-residential sales values

(Tools & Methods)

  • Tool: Tableau Public (interactive dashboards, calculated fields, geospatial maps, time-series charts).
  • Techniques:
    Created calculated fields for Rate per Sq. Ft. & KPIs.
    Designed interactive maps for sales & tax distribution.
    Used time-series line charts to capture seasonality and trend.
    Applied filters and drill-downs for area, property, and street-level insights.
📈 Business Impact & Insights
  • Market Opportunity Identification: Highlighted M & HS as prime investment zones with the highest sales & taxes.
  • Pricing Strategy: Rate per Sq. Ft. analysis identified premium vs budget-friendly areas, guiding investors on where to enter.
  • Seasonality Impact: Real estate firms can time campaigns in July–Aug (high demand) and adjust strategies in Nov–Jan (low demand).
  • Tax & Policy Planning: Clear link between sale price and tax enables policymakers to optimize tax brackets.
  • Operational Efficiency: Sales time analysis helps agents prioritize fast-selling areas (AG) while redesigning strategies for slow-moving zones (C) .
  • 10 Marketing & Retail Analytics

Project Name:
Café Chain Revenue Optimization through POS Data Analysis

What Did You Do?
  • Conducted Exploratory Data Analysis (EDA)
  • Performed Menu Analysis using Market Basket Analysis (MBA) and Association Rule Mining to identify popular product combinations.
  • Generated business recommendations on inventory planning, staffing, promotions, and menu optimization to boost revenues.
(Tools & Techniques):
  • Data Cleaning & Pre-processing:
  • EDA (Exploratory Analysis):
    Used Python (Pandas, Matplotlib, Seaborn) and KNIME for
  • Market Basket Analysis (MBA):
    Applied Apriori algorithm in KNIME (also replicable in Python using mlxtend).
    Generated Association Rules (Support, Confidence, Lift)
What Was the Impact?
  • Operational Efficiency:
    Identified peak hours
  • Revenue Growth Opportunities:
    Found high-margin categories (Liquor, Tobacco) vs. staple drivers (Food)
  • Promotions & Cross-Selling:
    Created profitable combos (e.g., Cappuccino + Great Lakes Shake, Hookah + Sambuca)

Quantified Impact (Projected):

  • 5–10% increase in weekend revenues through targeted promotions.
  • Reduced inventory wastage by 12–15% via demand-based stocking.
  • Increase in average order value by 7–9% through combo offers.
  • Improved labor efficiency with data-driven staff scheduling

  • 11 Finance and Risk Analytics

Bankruptcy Prediction project:

(Tools & Techniques):
  • early warning system for regulators, investors, and financial institutions. exploratory data analysis (EDA)
  • feature engineering
2. How did you do it?
  • Data Understanding & Cleaning:
  • Exploratory Data Analysis (EDA):
    Conducted univariate and bivariate analysis to understand
  • Feature Engineering & Preprocessing:
    Created meaningful ratios like Net_income / Total_assets and EBITDA / Total_liabilities.
    Addressed skewed distributions using log transformations.
    Scaled features using StandardScaler for model readiness.
    Checked for multicollinearity using VIF.


Business Impact:
  • Quantifiable Results:
    Model achieved high predictive accuracy and ROC-AUC, effectively distinguishing bankrupt vs non-bankrupt companies.
    Early warning system flagged high-risk companies before actual bankruptcy filings, enabling timely interventions

  • 12 Capstone Project - PGP-DSBA
I worked on optimizing the supply chain for an FMCG company producing instant noodles. (Tools & Techniques):
  • Data Understanding & Cleaning:
    Dataset: 25,000 records, 24 variables
  • Exploratory Data Analysis (EDA):
  • Feature Selection & Modeling:
    Correlation analysis identified the strongest drivers of shipment weight.
    Models trained: Logistic Regression, Random Forest, Gradient Boosting.
    Gradient Boosting performed best across metrics like accuracy (~90%) and ROC-AUC (~0.73).
Operational Impact:
Identified underperforming warehouses and suggested optimizing the top 20%, which could improve output by 15–18%.
Recommended risk mitigation measures (flood-proofing, temperature control) and urban network expansion to meet demand.

Timeline

Post Graduation Program in Data Science and Business Analytics, Great Lakes April 2025

04-2025

Post Graduation Program - Data Science and Business Analytics

Great Lakes

Bachelors - Civil Engineering

Chitkara University
AMAN SINHMAR