Highly driven data science professional with 6+ years of relevant work experience as Data engineer and Decision Scientist. Experience in creating AI agents and big data frameworks, ETL pipelines and providing analytical and data science solutions. Looking to acquire a challenging position in an environment where I can best utilize my technical, logical and administrative skills while making a significant contribution to the success of the organization.
Overview
6
6
years of professional experience
1
1
Certification
Work History
Data Engineer- Senior Associate
Atlassian
06.2024 - Current
Part of Product Data Engineering Team
Led the efforts around system reliability by establishing process for best practices and developed tableau dashboard for Incident reporting for pipeline health
Automated process for jira ticket created on slack using slack workflows
Implemented and developed a slack bot which connect to an AI agent for answering user queries on JSM data
Led ETL pipeline development for Key Results (KRs) using SQL, DBT, AWS, Databricks and Airflow, collaborating with 15+ Data Analysts and PMs to operationalise 10 KR tables.
Designed KR tables, implemented around 50+ data quality checks (Yoda), and set up alerting mechanisms to ensure data integrity and pipeline reliability.
Designed and developed single source of truth (SSoT) data assets for Jira Service Desk entities, consolidating and replacing 50+ tables containing diverse information
Developed new data pipelines and data assets to support new metrics and business capabilities by integrating both new and existing data sources, including Splunk and S3.
Redesigned around 20+ existing tables initially created by Data Scientists to improve data structure, efficiency, and scalability which resulted in 90% storage savings
Senior Data Engineer
Mercedes Benz Research and development, India
05.2021 - 05.2024
Innovation tracks
Developed an end-to-end AI Agent solution by using power Apps and copilot and fine-tuning a large language model (LLM) specifically on engineering standards documents and engineering-related content. This involved designing, implementing, and optimizing the AI agent to seamlessly integrate into team communication workflows, enhancing productivity and efficiency within engineering teams which resulted in saving ~6000 man hours
Ideated the concept for sentiment analysis of employees through a teams bot, used NLP algorithms for text analytics and created Mood KPI dashboard on powerBI
Developed MLOps based product where i was responsible for creating databricks notebooks for model ops (implemented classification and regression models)
Part of Vans Data engineering and Analytics Team
Designed and implemented ETL pipelines for Vans' use case from inception, transitioning on-premises ETL to the cloud (Azure).
Implemented the CI/CD process for the project using Azure DevOps pipelines and version controlling using Git.
Analysed the existing KNIME flow (20+ pipelines) and transformed the logic to a Databricks notebook using SQL and PySpark by following the Medalian architecture.
Redesigned and migrated several Tableau dashboards to Power BI.
Provided analytical support to the van endurance testing team, aiding in data analysis and insights generation and developing Powerbi dashboards consumed by vans leadership team.
Developed new PowerBI dashboards in accordance with stakeholder requirements, managing the process from data gathering to visualization.
Part of Certification (conzert) Data engineering team
Designed and developed ADF pipeline taking data from multiple systems, stored in ADLS
Implemented multiple features like email trigger, delta change in data detection, implementation of logging etc, created several databricks notebook for the same
Part of big data development team working on developing a framework to assess the quality of data
Developed DQF - a pyspark based data quality framework, which provides various checks which are configurable by the user to test the quality of their data also it has the functionality to generate statistical description for the data
Added the functionality of automated HTML report generation on each DQF run
Created several Azure Databricks python notebooks for incremental data generation, combining check results into one global result and running pyspark code after uploading wheel python libraries
Integrated SonarQube for code quality check in CICD pipeline, used Sonarlint plugin to resolve coding issues
Designed ADF pipeline for E2E journey of customers using the framework
Worked on improving the performance of the framework using spark optimization techniques
Created PowerBI report for the quality framework to be used by users to monitor their data performance on the quality metrics
Developed Terraform scripts for deploying resources on Azure and designed a DevOps pipeline for its automation
Decision Scientist
Mu Sigma
07.2019 - 04.2021
Experience in providing data engineering and analytical solutions for one of the largest tech company in its meeting devices domain
Data Engineering:
Was responsible for setting up the ETL pipeline for devices telemetry data, the pipeline consumed data from COSMOS, multidimensional cubes, flat files, SQL server, Azure Data lake etc.
Automated pipeline using PowerShell and SQL server agent to ensure the timely delivery and quality of data
Created multiple BI dashboards (PowerBI, Scuba) consumed by the senior leadership team for the visualization of key metrics and usage trends of meeting devices
Product Intelligence Analyst
User Engagement Analysis: Analysis of the user engagement of the devices across platforms and generating insights on the factors affecting the same
Rhythm of Business: Monitor and analyze performance of the product and generate insights across geographies, verticals and their clients(tenants) for the Leadership team on monthly basis
For a leading retailer, analyzed its digital campaign data and created a framework to measure its effectiveness
Exported data from HDFS to sql server using sqoop
EDA on data using jupyter python notebooks
Hypothesis testing done on the data and generated insights and recommendations based on that
Education
MS - Machine Learning and AI
Liverpool John Moores University
04.2024
Bachelor of Engineering - Computer science and engineering
Sir MVIT
06.2019
Skills
Data science and AI: Machine learning, deep learning, Gen AI (LLM and GAN), power apps, copilot
Big Data Framework: Hadoop, Spark, MapReduce, Hive
Dashboarding And Reporting Tool: PowerBI, Tableau, Excel, Scuba, Python plotly
Cloud Service and Data Engineering: Tools Azure, Databricks, SSIS, Airflow, DBT, AWS
Web development: HTML, CSS, Bootstrap, JavaScript
Databases: SQL Server, MYSQL
Control systems and Documentation: Git, Jira, Azure devops, confluence
Accomplishments
Atlassian:
Received KUDOS Award for quickly ramping up and supporting Data scientists with data engineering requirements with KR and required data assets development.
Mu Sigma:
Received SPOT Award framework to measure its effectiveness (Certificate of appreciation) for handling and managing multiple threads in the project and winning the client expectations and appreciations
MBRDI:
Received Great service award for satisfactory delivery of project and overcoming all the challenges smoothly
Received Bronze award for stabilising the vans use case from scratch
Certification
Microsoft certified: Azure Data Engineer Associate
Microsoft certified: Azure Data Analyst Associate
Modelling Data warehouse with Data Vault: Udemy Product
Microsoft certified: Azure fundamentals
Azure Databricks spark core for data engineers- Udemy
Projects
House Price Prediction in Australia (Algorithm: - Linear Regression) - Tools Used: Python, NumPy, Pandas, Statsmodel, Scikit learn, Seaborn, Matplotlib, ML, Statistics, Regularization (Ridge, Lasso)