Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Projects
Timeline
Generic

Vishakha Singh

Bangalore

Summary

Highly driven data science professional with 6+ years of relevant work experience as Data engineer and Decision Scientist. Experience in creating AI agents and big data frameworks, ETL pipelines and providing analytical and data science solutions. Looking to acquire a challenging position in an environment where I can best utilize my technical, logical and administrative skills while making a significant contribution to the success of the organization.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer- Senior Associate

Atlassian
06.2024 - Current

Part of Product Data Engineering Team

  • Led the efforts around system reliability by establishing process for best practices and developed tableau dashboard for Incident reporting for pipeline health
  • Automated process for jira ticket created on slack using slack workflows
  • Implemented and developed a slack bot which connect to an AI agent for answering user queries on JSM data
  • Led ETL pipeline development for Key Results (KRs) using SQL, DBT, AWS, Databricks and Airflow, collaborating with 15+ Data Analysts and PMs to operationalise 10 KR tables.
  • Designed KR tables, implemented around 50+ data quality checks (Yoda), and set up alerting mechanisms to ensure data integrity and pipeline reliability.
  • Designed and developed single source of truth (SSoT) data assets for Jira Service Desk entities, consolidating and replacing 50+ tables containing diverse information
  • Developed new data pipelines and data assets to support new metrics and business capabilities by integrating both new and existing data sources, including Splunk and S3.
  • Redesigned around 20+ existing tables initially created by Data Scientists to improve data structure, efficiency, and scalability which resulted in 90% storage savings

Senior Data Engineer

Mercedes Benz Research and development, India
05.2021 - 05.2024

Innovation tracks

  • Developed an end-to-end AI Agent solution by using power Apps and copilot and fine-tuning a large language model (LLM) specifically on engineering standards documents and engineering-related content. This involved designing, implementing, and optimizing the AI agent to seamlessly integrate into team communication workflows, enhancing productivity and efficiency within engineering teams which resulted in saving ~6000 man hours
  • Ideated the concept for sentiment analysis of employees through a teams bot, used NLP algorithms for text analytics and created Mood KPI dashboard on powerBI
  • Developed MLOps based product where i was responsible for creating databricks notebooks for model ops (implemented classification and regression models)


Part of Vans Data engineering and Analytics Team

  • Designed and implemented ETL pipelines for Vans' use case from inception, transitioning on-premises ETL to the cloud (Azure).
  • Implemented the CI/CD process for the project using Azure DevOps pipelines and version controlling using Git.
  • Analysed the existing KNIME flow (20+ pipelines) and transformed the logic to a Databricks notebook using SQL and PySpark by following the Medalian architecture.
  • Redesigned and migrated several Tableau dashboards to Power BI.
  • Provided analytical support to the van endurance testing team, aiding in data analysis and insights generation and developing Powerbi dashboards consumed by vans leadership team.
  • Developed new PowerBI dashboards in accordance with stakeholder requirements, managing the process from data gathering to visualization.


Part of Certification (conzert) Data engineering team

  • Designed and developed ADF pipeline taking data from multiple systems, stored in ADLS
  • Implemented multiple features like email trigger, delta change in data detection, implementation of logging etc, created several databricks notebook for the same


Part of big data development team working on developing a framework to assess the quality of data

  • Developed DQF - a pyspark based data quality framework, which provides various checks which are configurable by the user to test the quality of their data also it has the functionality to generate statistical description for the data
  • Added the functionality of automated HTML report generation on each DQF run
  • Created several Azure Databricks python notebooks for incremental data generation, combining check results into one global result and running pyspark code after uploading wheel python libraries
  • Integrated SonarQube for code quality check in CICD pipeline, used Sonarlint plugin to resolve coding issues
  • Designed ADF pipeline for E2E journey of customers using the framework
  • Worked on improving the performance of the framework using spark optimization techniques
  • Created PowerBI report for the quality framework to be used by users to monitor their data performance on the quality metrics
  • Developed Terraform scripts for deploying resources on Azure and designed a DevOps pipeline for its automation



Decision Scientist

Mu Sigma
07.2019 - 04.2021

Experience in providing data engineering and analytical solutions for one of the largest tech company in its meeting devices domain


Data Engineering:

  • Was responsible for setting up the ETL pipeline for devices telemetry data, the pipeline consumed data from COSMOS, multidimensional cubes, flat files, SQL server, Azure Data lake etc.
  • Automated pipeline using PowerShell and SQL server agent to ensure the timely delivery and quality of data
  • Created multiple BI dashboards (PowerBI, Scuba) consumed by the senior leadership team for the visualization of key metrics and usage trends of meeting devices


Product Intelligence Analyst

  • User Engagement Analysis: Analysis of the user engagement of the devices across platforms and generating insights on the factors affecting the same
  • Rhythm of Business: Monitor and analyze performance of the product and generate insights across geographies, verticals and their clients(tenants) for the Leadership team on monthly basis
  • For a leading retailer, analyzed its digital campaign data and created a framework to measure its effectiveness
  • Exported data from HDFS to sql server using sqoop
  • EDA on data using jupyter python notebooks
  • Hypothesis testing done on the data and generated insights and recommendations based on that

Education

MS - Machine Learning and AI

Liverpool John Moores University
04.2024

Bachelor of Engineering - Computer science and engineering

Sir MVIT
06.2019

Skills

  • Data science and AI: Machine learning, deep learning, Gen AI (LLM and GAN), power apps, copilot
  • Programming languages: Python, R, Pyspark, SQL, C, C, yaml, Scope, Shell scripting
  • Big Data Framework: Hadoop, Spark, MapReduce, Hive
  • Dashboarding And Reporting Tool: PowerBI, Tableau, Excel, Scuba, Python plotly
  • Cloud Service and Data Engineering: Tools Azure, Databricks, SSIS, Airflow, DBT, AWS
  • Web development: HTML, CSS, Bootstrap, JavaScript
  • Databases: SQL Server, MYSQL
  • Control systems and Documentation: Git, Jira, Azure devops, confluence

Accomplishments

Atlassian:

  • Received KUDOS Award for quickly ramping up and supporting Data scientists with data engineering requirements with KR and required data assets development.

Mu Sigma:

  • Received SPOT Award framework to measure its effectiveness (Certificate of appreciation) for handling and managing multiple threads in the project and winning the client expectations and appreciations

MBRDI:

  • Received Great service award for satisfactory delivery of project and overcoming all the challenges smoothly
  • Received Bronze award for stabilising the vans use case from scratch

Certification

  • Microsoft certified: Azure Data Engineer Associate
  • Microsoft certified: Azure Data Analyst Associate
  • Modelling Data warehouse with Data Vault: Udemy Product
  • Microsoft certified: Azure fundamentals
  • Azure Databricks spark core for data engineers- Udemy

Projects

  • House Price Prediction in Australia (Algorithm: - Linear Regression) - Tools Used: Python, NumPy, Pandas, Statsmodel, Scikit learn, Seaborn, Matplotlib, ML, Statistics, Regularization (Ridge, Lasso)
  • Telecom Churn Prediction (Algorithm: - Logistic Regression, SVM, Random Forest, XGBoost) -Tools Used: - Python, NumPy, Pandas, Statsmodel, Scikit learn, Seaborn, Matplotlib, ML, Statistics, Boosting, Cross Validation
  • Skin Cancer Image Classification (Algorithm: - CNN) -Tools Used: - Python, NumPy, Matplotlib, ML, Neural Networks, Keras, TensorFlow
  • CNN Gesture Recognition for Smart TV (Algorithm:- RNN, LSTM, GRU, CNN) -Tools Used:- Python, NumPy, Matplotlib, ML, Keras, TensorFlow, CNN, RNN, LSTM, GRU

Timeline

Data Engineer- Senior Associate

Atlassian
06.2024 - Current

Senior Data Engineer

Mercedes Benz Research and development, India
05.2021 - 05.2024

Decision Scientist

Mu Sigma
07.2019 - 04.2021

Bachelor of Engineering - Computer science and engineering

Sir MVIT

MS - Machine Learning and AI

Liverpool John Moores University
Vishakha Singh