Results-driven Big Data Engineer with over three years of experience at SanDisk (Western Digital), specializing in the design and deployment of scalable big data solutions on the Azure cloud platform. Proficient in leveraging Apache Spark, the Hadoop ecosystem (HDFS, Hive), and Azure Data Factory for effective ETL orchestration and pipeline development. Demonstrated expertise in real-time telemetry analytics, distributed storage, data lake architectures, and optimizing Spark SQL/DataFrames. Committed to creating efficient, secure, and high-performance data platforms that drive actionable business insights.
Overview
3
3
years of professional experience
1
1
Certification
Work History
Associate Product Engineer
SanDisk (Western Digital)
09.2021 - 01.2025
Led the design, development, and deployment of a robust big data analytics platform for SanDisk's global line of SSDs, memory cards, and HDDs using Microsoft Azure cloud and open-source technologies. This solution ingested multiterabyte daily product telemetry and customer data streams, transforming and enriching them via scalable ETL pipelines using Azure Data Factory, Databricks (PySpark), and Apache Airflow for workflow orchestration.
Architected and deployed a distributed data analytics platform on Azure, integrating Apache Spark with Azure Databricks to deliver scalable, low-latency data pipelines for SSD telemetry and memory card customer analytics.
Redesigned legacy batch ETL using Azure Data Factory, Airflow, and Hive to support real-time insights and reduce processing time by 35%.
Migrated SQL-based product usage statistics from on-premise servers to Azure SQL Data Warehouse, ensuring seamless lift-and-shift migration and automated failover.
Automated ingestion, transformation, and validation workflows using Apache Airflow and Azure Data Factory, achieving 99.9% uptime and enabling reliable, scalable data pipeline operations with minimal downtime.
Implemented end-to-end DevOps practices: CI/CD via Azure DevOps, test automation for ETL jobs, and standardized code review using Python.
Developed Spark-based processors that analyse large volumes of device health logs (HDFS, Parquet), providing predictive maintenance alerts for enterprise SSD products.
Authored and published reusable code libraries for internal data engineering teams, reducing onboarding times and championing open-source best practices.
Worked with cross-functional teams (Product, QA, Cloud Ops) to define requirements, optimize resource utilization, and enhance developer productivity.
Contributed to internal open-source initiatives-custom Airflow operators and Hive UDFs for SD card telemetry workflows.
Managed source code versioning, branching, and collaboration using Git and GitHub, ensuring smooth team workflows and code integrity.
SanDisk is a global leader in product innovation, offering SSDs, Memory Cards, HDDs, and custom storage solutions for enterprise and consumer clients.
Education
BIET
Davangere, India
11-2020
Skills
Azure Data Factory
Azure Data Lake
Azure Databricks
Azure SQL
Azure DevOps
Apache Spark
PySpark
Data Frame
SQL
Hadoop HDFS
Hive
MapReduce
CI/CD
Python
Java
Git
GitHub
Continuous improvement
Documentation preparation
Injection molding experience
Schedule coordination
Certification
Oracle Certified Associate, Java SE 8 Programmer, 04/01/21, OC2167902, https://www.credly.com/badges/80ccb7d3-bf3d-45f5-aa5d-3bdf4f8eb3c4?source=linked_in_profile
Languages
English
Hindi
Kannada
Telugu
Disclaimer
I hereby declare that the above written particulars are true to best of my knowledge and belief., 10/07/25, Bangalore, Akshay Kumar A K
Training Specialist at Western Digital Malaysia (known as Sandisk Storage Malaysia)Training Specialist at Western Digital Malaysia (known as Sandisk Storage Malaysia)