Summary
Overview
Work History
Education
Skills
Websites
Projects
Timeline
Generic

MURALI BALASUBRAMANIAM

Bengaluru

Summary

Accomplished Senior Principal Software Engineer at Dell Technologies, specializing in multi-cloud big data solutions. Expert in Apache Spark and data governance, I drive impactful data pipeline management and enhance data quality. My collaborative approach ensures high availability and scalability in large-scale data lake architectures, delivering significant improvements in project outcomes.

Overview

17
17
years of professional experience

Work History

Senior Principal Software Engineer

Dell Technologies
Bangalore
02.2013 - Current
  • Senior Multi-Cloud Big Data Engineer for managing the data pipeline for large-scale data lakes and lake houses.
  • Senior technical and functional consultant for the Oracle PIM Data Hub project implementation.

Senior Engineer

Virtusa
Chennai
04.2010 - 02.2013
  • Technical and functional consultant for the Oracle PIM Data Hub project implementation.

Software Developer

AVA Solutions
Chennai
02.2010 - 04.2010
  • Deployed to Virtusa as PLSQL developer for a telecom project.

Database Developer

Sunny Soft Solutions
Chennai
07.2008 - 07.2009
  • Developed custom database procedure components for the postpaid reward system for the client's valuable customers.

Education

M.Sc. - Computer Science

PSG College of Arts And Science
Coimbatore, Tamil Nadu
06-2008

B.Sc. - Computer Science

SRKV College of Arts And Science
Coimbatore, Tamil Nadu
06-2006

Skills

  • Apache Spark (PySpark, Scala)
  • Apache Hadoop (HDFS, YARN)
  • Apache Hive
  • Apache Kafka
  • Apache Flink
  • Apache Cassandra
  • MongoDB
  • Elasticsearch
  • Delta Lake
  • Parquet
  • orc
  • Amazon Web Services (AWS - S3, EMR, Glue, Redshift, Kinesis, Lambda, Athena, Lake Formation)
  • Microsoft Azure (Data Lake Storage, Data Factory, Databricks, Synapse Analytics, Event Hubs)
  • Google Cloud Platform (GCP, BigQuery, Dataflow, Dataproc, Pub/Sub)
  • Python
  • sql (expert)
  • Java
  • Shell scripting
  • Data modeling (star schema, snowflake)
  • Data lakehouse architecture
  • Dimensional modeling
  • ELT/ETL processes
  • Data governance
  • Data quality
  • Metadata management
  • Data virtualization
  • Cassandra
  • DynamoDB
  • Redis
  • Apache Airflow
  • Jenkins
  • Docker
  • Kubernetes
  • Git
  • CI/CD
  • Terraform
  • CloudFormation
  • Data preparation for ML
  • Feature engineering
  • MLOps pipelines
  • mlflow
  • Linux
  • Unix
  • Distributed systems
  • Scalability
  • High availability
  • Real-time processing
  • Batch processing
  • Data security
  • Cost optimization
  • Performance tuning

Projects

Product Data Fabric: From DW/DL to a Centralized Lakehouse for Next-Gen Product Insights, Dell Technologies, This project aims to consolidate the companies’ diverse data assets from existing data warehouses and data lakes into a single, unified Lakehouse architecture. The goal is to enhance analytics, enable advanced AI/ML capabilities, and streamline data management for greater efficiency and insight., Spearheaded companies’ 'Product Data Fabric' initiative, building a cloud-agnostic Lakehouse to unify product-centric data (telemetry, attributes, design) from diverse sources (DW, DL, databases, CSV, JSON, ORC, Parquet). Managed 5TB+ daily ingestion and delivery to target systems for next-gen insights., Engineered Lakehouse pipelines with Spark (PySpark) on AWS EMR/Azure Databricks, optimizing data flow across the product lifecycle from manufacturing to sales. Reduced processing time by 30% for critical reports and ensured efficient data delivery to target systems., Managed unification of Dell's 5TB enterprise data lake into the Lakehouse (AWS S3/Azure Data Lake Storage), implementing Delta Lake/Lake Formation for governance across all product data attributes, preparing it for target systems., Developed ETL/ELT processes via AWS Glue/Azure Data Factory and Spark, orchestrating with Apache Airflow to deliver timely, accurate product insights and attribute data from various sources to target analytics and ML platforms., Implemented robust Lakehouse governance and security frameworks for sensitive product design and performance data, ensuring compliance and secure consumption for both internal Lakehouse use and external target system delivery., Mentored teams on Lakehouse adoption, fostering excellence in data engineering practices for companies’ global product data operations, including source ingestion and target delivery., Drove 15% cloud cost optimization for the new Lakehouse, enhancing efficiency for product insight generation and data distribution to target systems., Integrated ML model serving pipelines within the Lakehouse, enabling real-time inference from diverse product telemetry data for applications like predictive maintenance, with results delivered to operational target systems. 

Unified Product Data Hub: Phase 1 Data Lake Development, Dell Technologies, Contributed to the design and implementation of a fault-tolerant data streaming platform using Apache Kafka and Apache Flink for real-time ingestion and analytics on product telemetry and usage data within the developing Data Lake., Developed and optimized Hive and Spark SQL queries on the foundational Hadoop clusters of the Data Lake, improving query performance by up to 25% for business intelligence users analyzing product performance and attribute data., Built automated data quality checks and monitoring systems to ensure the integrity and reliability of raw and curated product data ingested into the Data Lake., Participated in data modeling efforts for new analytical requirements, translating business needs into efficient schemas for organizing product master data and related attributes within the Data Lake., Migrated key on-premise Hadoop workloads containing product data to cloud-native services (e.g., AWS EMR, Azure Dataproc), streamlining operations and reducing overhead as part of the initial Data Lake development.

Timeline

Senior Principal Software Engineer

Dell Technologies
02.2013 - Current

Senior Engineer

Virtusa
04.2010 - 02.2013

Software Developer

AVA Solutions
02.2010 - 04.2010

Database Developer

Sunny Soft Solutions
07.2008 - 07.2009

M.Sc. - Computer Science

PSG College of Arts And Science

B.Sc. - Computer Science

SRKV College of Arts And Science
MURALI BALASUBRAMANIAM