Seasoned data scientist experienced working with large datasets, breaking down information and applying interpretations to complex business concerns. Proficient in distribution, predictive and hypothetical modeling. Bringing several years of related experience strengthening company operations.
Project worked on:
1. Sentiment analysis: Implemented advanced analytics techniques, such as natural language processing and sentiment analysis.
2. Data analytics: utilised data analytics to inform decision-making and identify areas for improvement.
3. Predictive analysis: Managed analytics teams in developing advanced analytics solutions, such as machine learning models and predictive analysis algorithms.
4. Extracting texts from images, photos, and PDFs using Optical Character Recognition (OCR) - Paddle OCR and Tesseract.
5. Using regex patterns to extract texts from a sentence: My work involved using regex patterns to extract specific portions of text from large sentences, such as identifying phone numbers, email addresses, or ID values with accuracy and consistency.
6. Multi-language detector using 'langdetect and LanguageDetectorBuilder' to automatically identify the language of a given text input by analyzing character patterns and statistical models, enabling accurate multilingual text classification and processing.
7. Language translation using different pre-trained models.
8. Text searching using keywords involves scanning through documents, files, or databases to identify and retrieve specific information by matching user-defined keywords, enabling quick filtering, and efficient access to relevant content.
9. Data generation using the keyword-Transformers model: leveraging transformer-based architectures to automatically generate coherent, context-aware text or datasets conditioned on specific keywords, ensuring relevance, diversity, and semantic alignment with the given input terms.