A seasoned Data Scientist with over 9 years of expertise Generative AI and Machine Learning, including specialties in LLM, Deep Learning, and MLOPS within the Google Cloud Platform (GCP). Google Cloud certified Machine Learning Engineer. End to end Model deployment and data pipeline experience on GCP Platform. Experience in Team Management and stakeholder Management.
1. Utilized Hugging Face's pretrained opensource LLM to convert natural language text into SQL queries, employing various models such as llama2, codeLlama 34B, codeLlama 7B, and Mistral 7B. Implemented diverse prompt techniques for enhanced SQL query accuracy. Integrated Chroma vector database and RAG to store domain and database information, facilitating dynamic schema generation. Executed the generated SQL queries on a Vertica database, obtaining answers for subsequent delivery to network operators via a interactive user interface.
Models and Framework used : CodeLama 7B,13B,34B, T5 Spyder,Mistral 7B,Prompt engineering (Zero shot,Fewshot and COT)RAG,Langchain
2. We finetuned small LLMs, including TinyLLM, Phi-1.5, and Facebook/Opt-1.3b, using domain-specific data to classify customer complaints effectively. When a user submits a complaint, our finetuned LLM classifies it into a specific category, enabling the resolution team to address issues promptly. We employed both adopter and lora methodologies for this task and implemented quantization mechanisms to optimize model loading and memory usage. The finetuned models exhibited a significant 25% increase in accuracy compared to the base model, showcasing their effectiveness in multi-class classification tasks.
3.Developed a predictive model for dynamic threshold values of Radio Access Network (RAN) Key Performance Indicators (KPIs), utilizing clustering techniques for network cell grouping. The model predicts thresholds across 2G, 3G, 4G, and 5G KPIs, incorporating boosting-based multioutput regression and statistical anomaly detection algorithms. Deployed on Google Cloud Platform (GCP) and Vertex AI, the model includes a live data comparison system generating support tickets. Network operators rely on the Real-Time Performance Monitoring (RTPM) tool for day-to-day operations, deployed across various clients, showcasing its versatility. Dashboard creation leverages Tableau and Google Data Studio.
Models and Framework used: Multioutput Regression, Anomaly detection, GCP Vertex AI, Bigquery, BigqueryML, MLops,catboost algorithm
4. Implemented a quantile regression model for a prominent US office stationary supplier, predicting B2B customers' realistic total wallet and Share of Wallet (SOW) based on previous year sales and demographics. The model clusters customers based on their share of wallet, providing valuable insights for strategic decision-making.
5. Designed a predictive model for three clients to forecast invoice payment delay probabilities, identifying critical invoices with higher outstanding amounts and a greater likelihood of late payment in the future. Leveraged various invoice and customer-level variables in the model creation process.
Models and Framework used Logistic Regression, Random Forest. Employed R for modeling, Tableau for visualization, and PostgreSQL for data storage.