Project 1: Consumer RnD Data Lake Migration from Azure to AWS
Client: Kenvue
Role: Senior Data Engineer
Technologies: AWS Glue, Amazon S3, Athena, Lake Formation, PySpark, Delta Lake, AWS Secrets Manager, CloudWatch, SNS
Description:
Migrated the RegPoint and HAQ (Health Authority Query) platforms from Azure (ADF + Databricks) to AWS-native architecture using Medallion Architecture (Raw, Base, Core, Reporting). Delivered scalable, metadata-driven ingestion and transformation pipelines for regulatory product lifecycle and health authority communications.
Key Contributions:
- Re-engineered Azure pipelines to AWS Glue, S3, Athena, and Lake Formation, ensuring scalability and security.
- Designed metadata-driven ingestion from sources like Azure Cosmos DB to S3 using Medallion Architecture.
- Built in-memory transformations replicating Azure Databricks logic, avoiding unnecessary S3 writes.
- Flattened nested JSON structures into analytics-ready tables using PySpark and dynamic SQL.
- Implemented incremental loads with watermarking, de-duplication, and schema evolution using Delta Lake.
- Secured connections with SASL_SSL and AWS Secrets Manager for Kafka and Cosmos DB.
- Enabled real-time + batch HAQ ingestion integrated with RegPoint for lifecycle tracking.
- Set up CloudWatch/SNS alerts for monitoring, job audits, and failure notifications.
- Enforced fine-grained, role-based access via AWS Lake Formation.
- Developed config-driven Glue jobs for multi-schema ingestion, improving reusability and scalability.
Project 2: Retail Sales Analytics Lakehouse Implementation in Microsoft Fabric
Client: Domino’s Pizza (POC)
Role: Microsoft Fabric Data Engineer
Technologies: Microsoft Fabric (Lakehouse, Dataflows, Notebooks), OneLake, PySpark, Delta Lake, Azure Data Lake Storage Gen2, Microsoft Entra ID, Azure Key Vault
Description:
Designed and implemented a Medallion Architecture (Bronze–Silver–Gold) entirely within Microsoft Fabric to enable near real-time sales performance and customer value analytics for Domino’s Pizza. The solution ingested data from SQL Server and Azure Data Lake into OneLake, processed it with PySpark notebooks, and delivered curated, analytics-ready datasets for enterprise reporting and decision-making.
Key Contributions:
- Built end-to-end Fabric Lakehouse pipelines for Bronze (raw), Silver (cleaned), and Gold (business-ready) layers using Fabric notebooks and Delta Lake format.
- Developed PySpark transformations for data cleansing, standardization, normalization, enrichment, and derived column creation.
- Designed Gold layer star schema models optimized for analytical queries and KPIs such as CLV, MTD/YTD sales, and top product categories.
- Implemented metadata-driven processing to handle multiple datasets dynamically without hardcoding.
- Configured OneLake shortcuts for cross-domain data sharing without duplication.
- Applied data governance and security with Microsoft Entra ID role-based access control and Azure Key Vault for secrets management.
- Implemented full-load and truncate-insert strategies for efficient data refresh based on business requirements.
- Optimized PySpark job performance by tuning partitions, caching strategies, and minimizing shuffle operations.
Project 3: Dominos_MS_Fabric – Load & Pickup Report Automation (POC)
Client: Domino’s
Manager: Niranjan Kumar Makkuva (niranjankumar.m@sonata-software.com)
Role: Developer
Technologies: Microsoft Fabric, Lakehouse, PySpark, SQL Server, Power BI, Data Pipelines, DAX
Description:
Developed a Proof-of-Concept in Microsoft Fabric to automate Domino’s reporting process by consolidating data from multiple sources into Fabric OneLake. The POC automated the Load and Pickup report, delivering real-time insights for Shift Managers and Senior Directors across departments.
Key Contributions:
- Implemented Medallion Architecture in Microsoft Fabric for SQL Server data ingestion into the Lakehouse.
- Built automated ingestion, transformation, and movement pipelines in Fabric using PySpark Notebooks.
- Created fact and dimension tables in Fabric Datawarehouse; developed optimized SQL scripts, views, and stored procedures.
- Delivered optimized reports and dashboards in Power BI for warehouse analysis and business insights.
- Collaborated with client teams to gather requirements, present updates, and align deliverables with business goals.
- Replaced manual reporting with automated pipelines for real-time decision-making.
- Tuned queries and workflows for minimal latency in dashboards.
Project 4: Modern BI Platform Evaluation – Microsoft Fabric (POC)
Client: Myntra
Role: Developer
Technologies: Microsoft Fabric, Databricks, Power BI Desktop, Azure Storage Explorer
Description:
Evaluated Microsoft Fabric’s OneLake + Power BI against the existing Databricks Delta Lake + Power BI stack for performance, integration, and analytics capabilities.
Key Contributions:
- Built shortcuts from ADLS Gen2 Delta tables to Fabric Lakehouse for unified reporting.
- Developed SQL Endpoint Views and semantic models replicating existing Databricks dashboards.
- Tested Direct Query and Direct Lake modes for performance benchmarking.
- Implemented aggregated layers in Fabric Notebooks and applied business filters.
- Documented performance comparisons to aid migration decisions.
Project 5: Cross-Region Azure Synapse & Databricks Migration
Client: Myntra
Role: Developer
Technologies: Azure Synapse Analytics, AZCopy, Azure Databricks, MySQL, Azure Data Lake
Description:
Migrated SQL data warehouses, storage accounts, Databricks workspaces, and MySQL databases from South India to Central India regions, ensuring integrity and performance parity.
Key Contributions:
- Migrated Azure Synapse SQL databases, updating logins and configurations.
- Executed storage migrations with AZCopy and validated post-migration data.
- Migrated Hive to Delta tables, resolving schema and access issues.
- Managed permission revocation/re-grant and replicated WLM settings.
- Migrated Databricks clusters, libraries, and workspace configurations.
- Handled credentials (SAS tokens, secrets, SPNs) for environment setup.
Project 6: Sales & Production Data Integration on Azure
Client: Kona Bikes
Role: Azure Data Engineer
Technologies: Azure Data Factory, Azure Databricks, PySpark, Azure SQL Database, ADLS
Description:
Built a scalable Azure-based data pipeline for ingesting structured/unstructured data from multiple sources into ADLS and Azure SQL Database for analytics and business reporting.
Key Contributions:
- Developed ADF pipelines to ingest from Oracle, SQL Server, and flat files.
- Built Databricks notebooks for standardization and transformation.
- Implemented full/incremental loads, audit logging, and automated execution.
- Applied data reconciliation to ensure quality and accuracy.
- Tuned Spark configurations to optimize performance.
Project 7: On-Premises to Azure Cloud Data Ingestion
Client: Merrill Edge, USA
Role: Junior Data Engineer
Technologies: Azure Data Factory, SQL Server, Azure SQL Database, ADLS
Description:
Implemented on-premises to Azure cloud data ingestion pipelines using ADF to support analytics and reporting needs.
Key Contributions:
- Set up Self-Hosted Integration Runtime for on-premises to cloud data movement.
- Created linked services/datasets to connect sources and destinations.
- Developed and tested ingestion pipelines from SQL Server to ADLS/Azure SQL.
- Automated ADF pipelines for scheduled refreshes.
- Maintained audit logs and monitored pipelines.
- Provided production support and minor data fixes.