Experienced Senior Data Scientist skilled in advanced analytics, machine learning, and predictive modeling. Proficient in transforming data into strategic insights that drive business growth and innovation.
Data Quality Solution:
· Led end-to-end design and delivery of a data quality framework to detect anomalies and assess data integrity using ML models like Isolation forest and rule-based validations.
· Built an AI agent that translates user queries into SQL, retrieves insights from Databricks, and presents them in a user-friendly format.
· Enabled proactive alerts to data teams, improving decision-making with clean, reliable data.
· Mentored junior consultant, helping them enhance their skills and contribute more effectively to the project.
Accounting policy violation detection agent:
· Designed an end-to-end agentic workflow to identify accounting policy rule violations in journal entries.
· Enabled dynamic interpretation of user-uploaded policy documents and natural language queries for specific time periods (month/quarter).
· Built an intelligent agent that converts user questions into SQL queries and executes them in parallel for multiple rules across Databricks tables.
· Delivered concise summaries and examples of violations found, enhancing audit readiness and compliance visibility.
· Successfully developed and deployed the solution as a proof of concept, demonstrating scalable automation in policy enforcement.
· Led a team of two consultants to execute the project, ensuring timely delivery and alignment with business expectations
Travel anomaly detection:
· Conducted in-depth analysis of travel data to uncover patterns in high-fare air travel routes.
· Designed and implemented a K-means clustering model combined with rule-based logic to detect anomalous air-fare values.
· Incorporated multiple features such as trip type, ticket class, and origin-destination pairs to improve anomaly detection accuracy.
· Integrated the solution into AWS EMR for scalable processing and automated output generation.
· Delivered insights through a real-time dashboard, enabling business teams to proactively monitor and flag suspicious travel activity.
Transaction tagging:
· Automated bank transaction categorization in the Trax system and classified them into Tier1,2 and 3 categories.
· Developed a Random Forest classifier using attributes like description, payment type, and account type etc.,
· Achieved an accuracy of ~92% for the model prediction.
· Enhanced model performance by identifying description patterns and building specialized models for high-risk geographies.
Account cash payable forecasting:
· Architected a comprehensive cash forecasting model, tailored for both aggregated and instrumental levels, for GE Corporate.
· Developed using advanced Extreme Gradient Boosting (XGBoost) techniques in Python, the model underwent rigorous training on historical data and was subsequently evaluated against actual cash flows.
The model is delivering consistently accurate forecasts for vendor cash payables, thereby aiding strategic decision-making.
Supplier Connect matching:
· Independently led the development of a Python-based machine learning solution to automate supplier data matching across systems.
· Utilized cosine similarity and advanced data mining techniques to replace 100% manual comparison with algorithmic matching.
· Automated the process by integrating the solution with the client’s tool and establishing a direct database connection.
· Enabled identification of overlaps between customer and supplier records, enhancing data transparency.
· Transformed the matching workflow into a one-click operation, significantly improving operational efficiency.
Data Governance for GE Capital:
· Designed and implemented a fully automated reconciliation system using SAS and Excel to harmonize GE Capital home lending data with historical reports.
· Conducted rigorous cross-validation against predictive models to ensure data accuracy and consistency.
· Strengthened data integrity and security, supporting GE Capital’s data securitization efforts.
· Delivered a scalable solution that enhanced trust in financial reporting and compliance readiness.
Python, R
Tools( AWS,GIT,Databricks,Spotfire, Power BI)
Classification and Prediction -Logistic Regression, Random forest, XGBoost
Isolation Forest
Clustering techniques
Text analysis/NLP
Gen AI, Agentic AI