Data Scientist & Engineer with 5+ years of experience in data analytics, Python development, and server-based machine learning and data processing solutions.
Proficient in designing and deploying scalable ML pipelines, ETL processes, and automated workflows handling 200K–500K records daily.
Experienced in NLP, OCR, and document automation systems, reducing manual effort by up to 60% and enhancing operational efficiency.
Skilled in distributed computing, optimizing server-based data processing to achieve up to 40% faster performance.
Expert in web automation and data collection using Selenium, Playwright, and Py-AutoGUI, including bypassing anti-bot mechanisms.
Adept at building interactive dashboards and visualizations with Django, Pandas, and Matplotlib to deliver actionable insights.
Proven track record in mentoring and leading teams, ensuring on-time project delivery while reducing timelines by 25%.
Strong ability to manage end-to-end project delivery, from data ingestion, cleaning, and transformation to ML modeling, visualization, and deployment.
Quick learner and problem solver, capable of adapting to evolving technologies and dynamic project requirements in fast-paced environments.
Server-Based ML Pipelines (2025 – Present)
Designed and deployed server-based ML pipelines processing 200K–500K records daily with 99.9% uptime, significantly improving data processing efficiency and reliability.
NLP Document Automation (2025 – Present)
Developed NLP-driven document processing systems to automate unstructured data extraction, reducing manual effort by 60%.
Distributed Computing Optimization (2025 – Present)
Implemented distributed computing solutions on office servers, achieving 40% faster data processing across multiple internal projects.
ETL Pipeline Development (2024 – 2025)
Built fault-tolerant ETL pipelines using Apache Spark, processing 100+ GB of data daily while ensuring high accuracy and reliability.
Data Validation Framework (2024 – 2025)
Developed automated data validation systems, improving accuracy by 45% and enhancing governance across in-house systems.
Legacy System Migration (2024 – 2025)
Migrated legacy systems to modern server-based big-data frameworks, reducing infrastructure costs by 30% without disrupting operations.
CAPTCHA Solving Automation (2022 – 2024)
Created machine learning and deep learning algorithms to efficiently solve CAPTCHA challenges, enhancing user experience and workflow automation.
Website Automation & Data Extraction (2022 – 2024)
Automated complex data collection workflows across domains such as Airfare, Retail, Vacation, Car, Social Media, and Property, while mitigating anti-tracking mechanisms.
Data Analytics & Visualization Dashboards (2019 – 2022)
Built interactive dashboards and automated reporting systems, enabling real-time insights and informed decision-making.
Sales Forecasting & Inventory Prediction (2019 – 2022)
Applied machine learning and data mining to forecast sales trends and manage seasonal inventory, improving operational efficiency.
Customer Sentiment Analysis (2019 – 2022)
Scraped and analyzed customer reviews from Amazon and Flipkart to generate actionable insights for marketing and product strategy.