Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Abhinav Shukla

Lead Data Scientist
Chandigarh

Summary

Results-Driven Machine Learning and AI Expert with 7+ Years of Experience

Highly skilled and accomplished Machine Learning and AI professional with over 7 years of hands-on experience in the field. Adept at leveraging statistical analysis, predictive modeling, and artificial intelligence to drive data-driven insights and solutions. Proficient in both R and Python scripting, with expertise in Pandas, Scipy, Numpy, TensorFlow, and Keras.

  • Machine Learning Expertise: Proven track record of creating descriptive, predictive, and forecasting models using both R and Python. Expertise in designing and implementing solutions for a wide range of analytical use cases.
  • Deep Learning: Specialized in deep learning techniques, including RNNs (Recurrent Neural Networks), LSTMs, and CNNs. Proficient in using neural networks for text generation and computer vision projects.
  • Big Data and Cloud: Skilled in harnessing the power of Big Data and Cloud technologies, including Hadoop, AWS (Redshift), and Snowflake. Experience in handling and analyzing large datasets.
  • Text Analytics and NLP: Proficient in Text Analytics and Natural Language Processing (NLP). Capable of building models for sentiment analysis, text classification, and more.
  • AI Solutions: Adept at creating and implementing AI solutions for various industries and problem domains. Passionate about exploring new analytical use cases and challenges.
  • Programming Languages: Proficient in R, Python, Java, and experienced in deploying machine learning solutions on cloud platforms like AWS and GCP.

A dedicated problem solver with a strong commitment to staying at the forefront of AI and ML advancements. Excited to contribute expertise and innovation to new projects and challenges.

Overview

9
9
years of professional experience
6
6
years of post-secondary education
3
3
Languages

Work History

Lead Data Scientist

Tatras Data
Chandigarh
09.2022 - Current
  • Lead data engineering team for Govshop platform.
  • Worked with Customer teams for requirement gathering and designing roadmap/milestones and corresponding implementation plans.
  • Compiled, cleaned and manipulated data for proper handling
  • Developed data pipelines for automated data ingestion into our Data warehouse - Snowflake.
  • Data automation was achieved using Apache airflow for Orchestration, S3 for data lake/data storage and AWS Glue for loading data from S3 to Snowflake raw database.
  • Processed and transformed raw data in snowflake using Pyspark, Snowflake procedures and functions.
  • Also worked on implementing NLP based system to predict inspection note classes to help customer create Bill estimates for their consumers.
  • Trained multiple models for this purpose like - MLP, Logistic regression, SVM and Random forests.

Senior Technology Manager

Formidium Technologies
Jaipur
03.2022 - 09.2022
  • Managed network and system performance, conducting troubleshooting, security patching, and maintenance
  • Updated customers and senior leaders on progress and roadblocks
  • Worked on optimizing Crypto websockets data ingestion into mongodb database collections.
  • Reduced data loss due websocket connection issues and increased data accuracy upto 70%.
  • Guided organizational technology strategy and roadmaps

Senior Machine Learning Engineer

Trantor Software Limited
Chandigarh
10.2019 - 03.2022
  • Developed DDP tool to aid authors in selecting topics for book writing, streamlining book publishing process that traditionally takes up to 2 years.
  • Turned data into actionable insights
  • Designed and implemented the complete end-to-end solution for the DDP tool.
    Implemented a Reddit scraper functionality to collect data from subreddits, including posts and comments.
  • Built an NLP pipeline on top of the Reddit scraper to process text data and extract relevant keywords.
  • Extracted Google Trends analysis data for keywords.
  • Scraped Amazon books to identify relevant books for authors.
  • Developed backend Rest API to serve DDP books results for frontend application.
  • Worked on optimizing pdf data ingestion pipeline job. Reduced running time of the ingestion job to almost 60%.
  • Studied new technologies to support machine learning applications
  • Composed production-grade code to convert machine learning models into services and pipelines to be consumed at web-scale
  • Authored code fixes and enhancements for inclusion in future code releases and patches
  • Coordinated deployments of new software, feature updates and fixes
  • Built databases and table structures for web applications
  • Tuned systems to boost performance
  • Created proofs of concept for innovative new solutions

Data Scientist

Xlpat Labs
Chandigarh
02.2018 - 10.2019
  • Developed Text summarizer system designed to provide concise summaries of disclosures within the system. Summarizer acted as a bot, catering to analysts and customers by extracting essential information from patents.
  • Utilized extractive summarization techniques, with a focus on identifying crucial sentences in text. Achieved this through sentence scoring, where sentences received scores based on keywords and key phrases found in them. Leveraged Tf-idf, Textrank, and Rake algorithms to identify these keywords.
  • Selected top-scoring sentences to create comprehensive summaries, streamlining information for easy consumption.
  • Constructed technical corpus and dictionary to assist patent analysts in extracting synonyms of technical keywords. Dictionary was compiled from diverse data sources, including Raw Patents' brief summary text, Wikipedia text, and historical database queries.
  • Employed Word2Vec word embedding models to build corpus, which contained approximately 1 billion words.
  • Developed module with objective of automatically generating first patent draft from inventor's initial idea. This module allowed inventors to provide a brief description of their idea, and system would produce patent draft accordingly.
  • Achieved this by training LSTM neural network on extensive patent claims and descriptions, enabling system to generate text based on input provided by inventor.
  • Compiled, cleaned and manipulated data for proper handling
  • Created and implemented new NLP models to increase company productivity
  • Utilized advanced querying, visualization and analytics tools to analyze and process complex data sets

Associate Data Scientist

Wipro
Bangalore
07.2014 - 02.2018
  • Utilized advanced querying, visualization and analytics tools to analyze and process complex data sets
  • Assessed accuracy and effectiveness of new and existing data sources and data analysis techniques
  • Set up SQL database on cloud servers to store client data for query analysis
  • Compiled, cleaned and manipulated data for proper handling
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
  • Worked as Data Engineer for Capital One for migration from Legacy systems to cloud based infrastructure in AWS.
  • Built data ingestion framework to ingest, store and process data from abinitio LR files to S3 data lake and then to AWS Redshift Data warehouse.
  • Developed database architectural strategies at modeling, design and implementation stages to address business or industry requirements

Education

Master of Science - Data Science and Engineering

VIT Vellore
Vellore
04.2015 - 08.2017

Bachelor of Science - Computer Science

Chitkara University
Baddi
08.2010 - 06.2014

Skills

Hadoop,Hive,Redshift,Mongodb

undefined

Accomplishments

  • Build works on Linode ( Cloud)
  • Distributed queue system for Web
  • Crawling
  • Install worker documents supervisor Flower parallel-ssh
  • Abhinav Shukla
  • Data scientist AI/ML Engineer
  • 07837153247
  • Abhinavshukla92@yahoo.com
  • Experience with Data science, Machine learning, Statistical modeling, Web crawling and Data mining
  • Hopes to focus more on datascience and Artificial intelligence in future career
  • I have over 7+ years of experience in Machine Learning(AI/ML), Statistical Analysis, Statistical Modeling, Artificial Intelligence
  • Deep learning(RNN,LSTM,CNN),R Language, Python Scripting(Pandas, Scipy, Numpy, Tensorflow, Keras), Java ,Hadoop, AWSand GCP
  • I have been working towards creation of descriptive, predictive as well as forecasting models using R and Python for multipleyears.Also worked on implementation of solutions using Big Data/Cloud technologies like Hadoop, AWS(Redshift), Snowflakeetc.
  • Looking for exposure to new analytical use cases/problems
  • I am also having experience in Text Analytics,Natural Language
  • Processing, Artificial Neural Networks etc
  • I have worked on solving problems using Neural networks like RNN(Recurrent Neural
  • Networks,LSTMs) for text generation and have also worked on few personal computer vision projects as well
  • Technical Corpus(Data Dictionary)
  • Built a technical corpus/dictionary which will helppatent analysts to extract synonyms of technicalkeywords
  • This dictionary was built using multipledata sources like Raw Patents brief summary text
  • Wikipedia text and historical database queries
  • Wordembedding models like Word2Vec were used to buildthis Corpus that contains around 1 billion words
  • Text Summarizer
  • Built a text summarizer system to provide briefsummary of the disclosure entered in the system
  • Inother words the summarizer will act as a bot that willread patents for the analyst or customer and provide asummary for the same Extractive summarizationemphasized on summarizing the text by identifyingimportant sentences in the text
  • This was done bysentence scoring i.e
  • Giving sentences scores based onkeywords and key phrases present in them which wereidentified using Tf-idf, Textrank and Rake algorithms
  • Top scoring sentences were selected to form thesummary.
  • Patent Drafting
  • The aim of this module was to generate first patentdraft from the idea given by an inventor
  • In thismodule inventor can provide his idea in a fewsentences and the system will automatically generatepatent draft for the given text
  • To accomplish this task
  • LSTM neural network was trained on thousands ofpatent claims and descriptions to generate text fromgiven input
  • Capital One: Data transformation
  • Replacing old legacy data storage systems with
  • Cloud storage(After attempting to move intohadoop ecosystem)
  • The main objective of theproject is to move data from Teradata to AWScloud platform
  • Data is migrated to S3 Lake the
  • Object Storage service provided by AWS
  • Datais also moved to Redshift -Data warehouseservice from AWS
  • Before moving data to clouddata preprocessing is done
  • Statistical modelingis done on the data after it is moved tocloud(Redshift)
  • Expression Detection: Computer vision
  • Worked on building a human facial expressiondetection web app using Deep Learning
  • Convolutional Neural network model was trained onabout 35 thousand images with 7 differentexpressions classes(Angry, Disgust, Fear, Happy
  • Sad, Surprise, Neutral).PIL and OpenCv were used toprocess images and capture real time video frameswhich worked as input for the deep learning model
  • This app was deployed using Flask, Gunicorn and
  • Nginx.

Timeline

Lead Data Scientist

Tatras Data
09.2022 - Current

Senior Technology Manager

Formidium Technologies
03.2022 - 09.2022

Senior Machine Learning Engineer

Trantor Software Limited
10.2019 - 03.2022

Data Scientist

Xlpat Labs
02.2018 - 10.2019

Master of Science - Data Science and Engineering

VIT Vellore
04.2015 - 08.2017

Associate Data Scientist

Wipro
07.2014 - 02.2018

Bachelor of Science - Computer Science

Chitkara University
08.2010 - 06.2014
Abhinav ShuklaLead Data Scientist