Skilled IT professional with 9+ years of hands-on experience in AIML, Data Science and Big Data Technologies. 4 years of strong experience and comprehensive knowledge in Machine Learning, Artificial Intelligence and Deep Learning Techniques. Currently working as a Data Scientist at Common Wealth Bank (CBA).
1. Designed and implemented machine learning models on transaction data: BB Transaction Categorization Model.
The model categorizes BB transactions into different types of income and expenses. It’s a multi-class problem. Worked on data cleaning and missing value handling for the transaction raw data using Python. Developed powerful insights based on EDA, univariate, and bivariate analysis for datasets. Developed classification models by interacting and facing clients on a regular basis for model updates and changes; worked on the documentation of the entire process for the client. This model helped SMEs to manage their finances in a better way. Here, the CatBoost Classifier (which leverages gradient boosting on decision trees) gave 94% accuracy. Working on applying deep learning techniques to the transaction description attribute to make the model more advanced. Using Llama and Mistral. It’s in the proposal phase now.
2. Omnia – It’s the Bank’s Big Data ecosystem, which exists on a group of Hadoop clusters. It’s the bank's strategic solution to ingesting, transforming, and supplying production data to both front-end decision-making and back-end reporting solutions.
Tools & Technologies – Machine Learning Models, GradientBoost, NLP, Transformers (GenAI), Hadoop, Hive, Spark.
ODP (One Data Platform). The aim of this project is to make all the data from different sources of GE available in the AWS Cloud for users. Here, we created a common dimension for the ODP warehouse to form a star schema Datamart in Redshift. I have been part of source analysis, data ingestion, data processing, data landing to S3, and creating PySpark jobs in Glue. Tools and technologies used are AWS Glue, S3, Redshift, and PySpark.
Proactive Sensing Project, a big data ecosystem. Here, I worked on analyzing the source data and placing it in the staging area (HDFS). I worked on Hive and Spark SQL to process data, and I built Hive tables and queries, implementing partitioning (static, dynamic) and bucketing.
RA+ (Revenue Assurance BT project) and GSDW [Data Warehouse Project] - It’s a reporting and analytics platform built to support business operations analysis and strategic decision-making processes. Sqoop is used to load the incremental data from external sources into Hadoop EDW. Worked on the PySpark process to process the data for reporting.
Languages: Python and SQL
Machine Learning: Linear and Logistic Regression, KNN, SVM, K-Means Clustering, PCA, Decision Tree, Ensemble Techniques, Bagging (Bootstrap Aggregation), Random Forest, and Boosting (AdaBoost, Gradient Boost)
Deep Learning: Artificial Neural Networks (ANN), FNN, NLP - Text Analytics, TF-IDF, Word Embeddings (Word2Vec, GloVe), RNN, LSTM, Transformers, T5, FLAN-T5, BERT, Llama, and MISTRAL
Visualization: Matplotlib, Seaborn
IDE: PyCharm, Jupyter Notebook
Big Data Ecosystems: Hadoop, MapReduce, HDFS, Hive, Sqoop, PySpark, Spark
Cloud: AWS Glue, Redshift, S3