Summary

Overview

Work History

Education

Skills

Accomplishments

Certification

Projects

Timeline

Vishakha Singh

Bangalore

Summary

Highly driven data science professional with 6+ years of relevant work experience as Data engineer and Decision Scientist. Experience in creating AI agents and big data frameworks, ETL pipelines and providing analytical and data science solutions. Looking to acquire a challenging position in an environment where I can best utilize my technical, logical and administrative skills while making a significant contribution to the success of the organization.

Overview

years of professional experience

Certification

Work History

Data Engineer- Senior Associate

Atlassian

06.2024 - Current

Part of Product Data Engineering Team

Led the efforts around system reliability by establishing process for best practices and developed tableau dashboard for Incident reporting for pipeline health
Automated process for jira ticket created on slack using slack workflows
Implemented and developed a slack bot which connect to an AI agent for answering user queries on JSM data
Led ETL pipeline development for Key Results (KRs) using SQL, DBT, AWS, Databricks and Airflow, collaborating with 15+ Data Analysts and PMs to operationalise 10 KR tables.
Designed KR tables, implemented around 50+ data quality checks (Yoda), and set up alerting mechanisms to ensure data integrity and pipeline reliability.
Designed and developed single source of truth (SSoT) data assets for Jira Service Desk entities, consolidating and replacing 50+ tables containing diverse information
Developed new data pipelines and data assets to support new metrics and business capabilities by integrating both new and existing data sources, including Splunk and S3.
Redesigned around 20+ existing tables initially created by Data Scientists to improve data structure, efficiency, and scalability which resulted in 90% storage savings

Senior Data Engineer

Mercedes Benz Research and development, India

05.2021 - 05.2024

Innovation tracks

Developed an end-to-end AI Agent solution by using power Apps and copilot and fine-tuning a large language model (LLM) specifically on engineering standards documents and engineering-related content. This involved designing, implementing, and optimizing the AI agent to seamlessly integrate into team communication workflows, enhancing productivity and efficiency within engineering teams which resulted in saving ~6000 man hours
Ideated the concept for sentiment analysis of employees through a teams bot, used NLP algorithms for text analytics and created Mood KPI dashboard on powerBI
Developed MLOps based product where i was responsible for creating databricks notebooks for model ops (implemented classification and regression models)

Part of Vans Data engineering and Analytics Team

Designed and implemented ETL pipelines for Vans' use case from inception, transitioning on-premises ETL to the cloud (Azure).
Implemented the CI/CD process for the project using Azure DevOps pipelines and version controlling using Git.
Analysed the existing KNIME flow (20+ pipelines) and transformed the logic to a Databricks notebook using SQL and PySpark by following the Medalian architecture.
Redesigned and migrated several Tableau dashboards to Power BI.
Provided analytical support to the van endurance testing team, aiding in data analysis and insights generation and developing Powerbi dashboards consumed by vans leadership team.
Developed new PowerBI dashboards in accordance with stakeholder requirements, managing the process from data gathering to visualization.

Part of Certification (conzert) Data engineering team

Designed and developed ADF pipeline taking data from multiple systems, stored in ADLS
Implemented multiple features like email trigger, delta change in data detection, implementation of logging etc, created several databricks notebook for the same

Part of big data development team working on developing a framework to assess the quality of data

Developed DQF - a pyspark based data quality framework, which provides various checks which are configurable by the user to test the quality of their data also it has the functionality to generate statistical description for the data
Added the functionality of automated HTML report generation on each DQF run
Created several Azure Databricks python notebooks for incremental data generation, combining check results into one global result and running pyspark code after uploading wheel python libraries
Integrated SonarQube for code quality check in CICD pipeline, used Sonarlint plugin to resolve coding issues
Designed ADF pipeline for E2E journey of customers using the framework
Worked on improving the performance of the framework using spark optimization techniques
Created PowerBI report for the quality framework to be used by users to monitor their data performance on the quality metrics
Developed Terraform scripts for deploying resources on Azure and designed a DevOps pipeline for its automation

Decision Scientist

Mu Sigma

07.2019 - 04.2021

Experience in providing data engineering and analytical solutions for one of the largest tech company in its meeting devices domain

Data Engineering:

Was responsible for setting up the ETL pipeline for devices telemetry data, the pipeline consumed data from COSMOS, multidimensional cubes, flat files, SQL server, Azure Data lake etc.
Automated pipeline using PowerShell and SQL server agent to ensure the timely delivery and quality of data
Created multiple BI dashboards (PowerBI, Scuba) consumed by the senior leadership team for the visualization of key metrics and usage trends of meeting devices

Product Intelligence Analyst

User Engagement Analysis: Analysis of the user engagement of the devices across platforms and generating insights on the factors affecting the same
Rhythm of Business: Monitor and analyze performance of the product and generate insights across geographies, verticals and their clients(tenants) for the Leadership team on monthly basis
For a leading retailer, analyzed its digital campaign data and created a framework to measure its effectiveness
Exported data from HDFS to sql server using sqoop
EDA on data using jupyter python notebooks
Hypothesis testing done on the data and generated insights and recommendations based on that

Education

MS - Machine Learning and AI

Liverpool John Moores University

04.2024

Bachelor of Engineering - Computer science and engineering

Sir MVIT

06.2019

Skills

Data science and AI: Machine learning, deep learning, Gen AI (LLM and GAN), power apps, copilot
Programming languages: Python, R, Pyspark, SQL, C, C, yaml, Scope, Shell scripting
Big Data Framework: Hadoop, Spark, MapReduce, Hive
Dashboarding And Reporting Tool: PowerBI, Tableau, Excel, Scuba, Python plotly

Cloud Service and Data Engineering: Tools Azure, Databricks, SSIS, Airflow, DBT, AWS
Web development: HTML, CSS, Bootstrap, JavaScript
Databases: SQL Server, MYSQL
Control systems and Documentation: Git, Jira, Azure devops, confluence

Accomplishments

Atlassian:

Received KUDOS Award for quickly ramping up and supporting Data scientists with data engineering requirements with KR and required data assets development.

Mu Sigma:

Received SPOT Award framework to measure its effectiveness (Certificate of appreciation) for handling and managing multiple threads in the project and winning the client expectations and appreciations

MBRDI:

Received Great service award for satisfactory delivery of project and overcoming all the challenges smoothly
Received Bronze award for stabilising the vans use case from scratch

Certification

Microsoft certified: Azure Data Engineer Associate
Microsoft certified: Azure Data Analyst Associate
Modelling Data warehouse with Data Vault: Udemy Product
Microsoft certified: Azure fundamentals
Azure Databricks spark core for data engineers- Udemy

Projects

House Price Prediction in Australia (Algorithm: - Linear Regression) - Tools Used: Python, NumPy, Pandas, Statsmodel, Scikit learn, Seaborn, Matplotlib, ML, Statistics, Regularization (Ridge, Lasso)
Telecom Churn Prediction (Algorithm: - Logistic Regression, SVM, Random Forest, XGBoost) -Tools Used: - Python, NumPy, Pandas, Statsmodel, Scikit learn, Seaborn, Matplotlib, ML, Statistics, Boosting, Cross Validation
Skin Cancer Image Classification (Algorithm: - CNN) -Tools Used: - Python, NumPy, Matplotlib, ML, Neural Networks, Keras, TensorFlow
CNN Gesture Recognition for Smart TV (Algorithm:- RNN, LSTM, GRU, CNN) -Tools Used:- Python, NumPy, Matplotlib, ML, Keras, TensorFlow, CNN, RNN, LSTM, GRU

Timeline

Data Engineer- Senior Associate

Atlassian

06.2024 - Current

Senior Data Engineer

Mercedes Benz Research and development, India

05.2021 - 05.2024

Decision Scientist

Mu Sigma

07.2019 - 04.2021

Bachelor of Engineering - Computer science and engineering

Sir MVIT

MS - Machine Learning and AI

Liverpool John Moores University

Vishakha Singh

Summary

Overview

Work History

Data Engineer- Senior Associate

Senior Data Engineer

Decision Scientist

Education

MS - Machine Learning and AI

Bachelor of Engineering - Computer science and engineering

Skills

Accomplishments

Certification

Projects

Timeline

Data Engineer- Senior Associate

Senior Data Engineer

Decision Scientist

Bachelor of Engineering - Computer science and engineering

MS - Machine Learning and AI

Similar Profiles

Jeffrey KeimJeffrey Keim

Francisco Crespo SmithFrancisco Crespo Smith

Shwetha D SuvarnaShwetha D Suvarna

Ulka MasihUlka Masih

Sayyad FarhanSayyad Farhan