Summary

Overview

Work History

Education

Skills

Certification

Work Experience

Timeline

Hemanth Kumar Amari

Bengaluru

Summary

Accomplished Senior Data Engineer with 6 years of experience in designing and delivering scalable data solutions across Azure, AWS, and Microsoft Fabric. Specializing in ETL development and data pipeline automation, I am proficient in Apache Spark, PySpark, and SQL, driving operational efficiency through automated workflows and rigorous data validation. Experienced in Microsoft Fabric’s Lakehouse architecture and Medallion Architecture (Bronze, Silver, Gold layers), I have integrated diverse data sources, optimized performance, and ensured accuracy and reliability in analytics. A collaborative problem-solver, I excel in delivering high-quality, data-driven solutions aligned with business goals.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Sonata Software Limited

Bengaluru

02.2023 - Current

Designed and implemented end-to-end ETL pipelines to integrate diverse, large-scale data sources into a centralized data lake and warehouse.
Automated data workflows in Azure, AWS, and Microsoft Fabric, significantly improving processing speed and reducing manual intervention.
Developed advanced SQL and PySpark scripts for large-scale data extraction, transformation, and optimization.
Applied Medallion Architecture (Bronze, Silver, Gold layers) to standardize, clean, and enrich data for analytics and reporting.
Integrated heterogeneous sources (structured, semi-structured, unstructured) into unified formats using ETL best practices.
Partnered with cross-functional teams to translate complex business requirements into scalable data solutions.
Conducted performance tuning for Spark jobs, reducing execution times by up to 30%.
Implemented rigorous data quality checks to ensure accuracy, consistency, and compliance with governance standards.
Created and maintained technical documentation for reproducibility and knowledge transfer.

Junior Data Engineer

Etios Consulting India Pvt Ltd

New Delhi

08.2019 - 01.2023

Built and maintained ETL pipelines for collecting, transforming, and loading data from multiple sources into analytical platforms.
Performed data cleansing, transformation, and enrichment using Apache Spark and PySpark for large datasets.
Developed SQL scripts and queries to support data extraction, transformation, and business reporting.
Assisted in database optimization, improving query response times and reducing processing costs.
Collaborated with business analysts and data scientists to refine data requirements and deliver actionable insights.
Conducted data validation and reconciliation to ensure accuracy, completeness, and reliability.
Documented ETL workflows, data dictionaries, and transformation logic for operational transparency.
Supported migration of legacy pipelines to modern big data frameworks, enhancing scalability and performance.

Education

Bachelor of Technology - Electronics And Communication Engineering

Jawaharlal Nehru Technological UniversityAnantapur

Anantapur

05-2019

Skills

ETL Development & Data Pipeline Automation (Azure Data Factory, AWS Glue)
Big data processing: Apache Spark, PySpark, Databricks
Programming Languages: Python, SQL, Spark SQL, Scala (basic)
Data Modeling: Star/Snowflake schema design, Fact & Dimension modeling
Data Warehousing: Azure Synapse Analytics, AWS Redshift
Cloud Platforms: Microsoft Azure, AWS
Data Lake & Storage: ADLS Gen2, AWS S3, OneLake
Database Management: SQL Server, PostgreSQL, MySQL

Data Governance & Security: Unity Catalog, Lake Formation, Role-based access control
Data Integration: Real-time & batch ingestion from APIs, Kafka, SFTP, REST endpoints
Business Intelligence Tools: Power BI, Tableau (basic)
File Formats:Delta, Parquet, Iceberg,Avro, JSON, CSV
Version Control & CI/CD: Azure DevOps
Testing & Validation: Data quality checks, Unit testing in PySpark, Great Expectations (basic)
Performance Optimization: Query tuning, Spark job optimization, partitioning strategies
Documentation: Data dictionaries, pipeline documentation, technical design documents

Certification

DP-600: Implementing Analytics Solutions Using Microsoft Fabric
DP-700: Designing and Implementing Microsoft Fabric Data Solutions
DP-203: Data Engineering on Microsoft Azure
PL-300: Microsoft Power BI Data Analyst

Work Experience

Project 1: Consumer RnD Data Lake Migration from Azure to AWS

Client: Kenvue
Role: Senior Data Engineer
Technologies: AWS Glue, Amazon S3, Athena, Lake Formation, PySpark, Delta Lake, AWS Secrets Manager, CloudWatch, SNS

Description:
Migrated the RegPoint and HAQ (Health Authority Query) platforms from Azure (ADF + Databricks) to AWS-native architecture using Medallion Architecture (Raw, Base, Core, Reporting). Delivered scalable, metadata-driven ingestion and transformation pipelines for regulatory product lifecycle and health authority communications.

Key Contributions:

Re-engineered Azure pipelines to AWS Glue, S3, Athena, and Lake Formation, ensuring scalability and security.
Designed metadata-driven ingestion from sources like Azure Cosmos DB to S3 using Medallion Architecture.
Built in-memory transformations replicating Azure Databricks logic, avoiding unnecessary S3 writes.
Flattened nested JSON structures into analytics-ready tables using PySpark and dynamic SQL.
Implemented incremental loads with watermarking, de-duplication, and schema evolution using Delta Lake.
Secured connections with SASL_SSL and AWS Secrets Manager for Kafka and Cosmos DB.
Enabled real-time + batch HAQ ingestion integrated with RegPoint for lifecycle tracking.
Set up CloudWatch/SNS alerts for monitoring, job audits, and failure notifications.
Enforced fine-grained, role-based access via AWS Lake Formation.
Developed config-driven Glue jobs for multi-schema ingestion, improving reusability and scalability.

Project 2: Retail Sales Analytics Lakehouse Implementation in Microsoft Fabric

Client: Domino’s Pizza (POC)
Role: Microsoft Fabric Data Engineer
Technologies: Microsoft Fabric (Lakehouse, Dataflows, Notebooks), OneLake, PySpark, Delta Lake, Azure Data Lake Storage Gen2, Microsoft Entra ID, Azure Key Vault

Description:
Designed and implemented a Medallion Architecture (Bronze–Silver–Gold) entirely within Microsoft Fabric to enable near real-time sales performance and customer value analytics for Domino’s Pizza. The solution ingested data from SQL Server and Azure Data Lake into OneLake, processed it with PySpark notebooks, and delivered curated, analytics-ready datasets for enterprise reporting and decision-making.

Key Contributions:

Built end-to-end Fabric Lakehouse pipelines for Bronze (raw), Silver (cleaned), and Gold (business-ready) layers using Fabric notebooks and Delta Lake format.
Developed PySpark transformations for data cleansing, standardization, normalization, enrichment, and derived column creation.
Designed Gold layer star schema models optimized for analytical queries and KPIs such as CLV, MTD/YTD sales, and top product categories.
Implemented metadata-driven processing to handle multiple datasets dynamically without hardcoding.
Configured OneLake shortcuts for cross-domain data sharing without duplication.
Applied data governance and security with Microsoft Entra ID role-based access control and Azure Key Vault for secrets management.
Implemented full-load and truncate-insert strategies for efficient data refresh based on business requirements.
Optimized PySpark job performance by tuning partitions, caching strategies, and minimizing shuffle operations.

Project 3: Dominos_MS_Fabric – Load & Pickup Report Automation (POC)

Client: Domino’s
Manager: Niranjan Kumar Makkuva (niranjankumar.m@sonata-software.com)
Role: Developer
Technologies: Microsoft Fabric, Lakehouse, PySpark, SQL Server, Power BI, Data Pipelines, DAX

Description:
Developed a Proof-of-Concept in Microsoft Fabric to automate Domino’s reporting process by consolidating data from multiple sources into Fabric OneLake. The POC automated the Load and Pickup report, delivering real-time insights for Shift Managers and Senior Directors across departments.

Key Contributions:

Implemented Medallion Architecture in Microsoft Fabric for SQL Server data ingestion into the Lakehouse.
Built automated ingestion, transformation, and movement pipelines in Fabric using PySpark Notebooks.
Created fact and dimension tables in Fabric Datawarehouse; developed optimized SQL scripts, views, and stored procedures.
Delivered optimized reports and dashboards in Power BI for warehouse analysis and business insights.
Collaborated with client teams to gather requirements, present updates, and align deliverables with business goals.
Replaced manual reporting with automated pipelines for real-time decision-making.
Tuned queries and workflows for minimal latency in dashboards.

Project 4: Modern BI Platform Evaluation – Microsoft Fabric (POC)

Client: Myntra
Role: Developer
Technologies: Microsoft Fabric, Databricks, Power BI Desktop, Azure Storage Explorer

Description:
Evaluated Microsoft Fabric’s OneLake + Power BI against the existing Databricks Delta Lake + Power BI stack for performance, integration, and analytics capabilities.

Key Contributions:

Built shortcuts from ADLS Gen2 Delta tables to Fabric Lakehouse for unified reporting.
Developed SQL Endpoint Views and semantic models replicating existing Databricks dashboards.
Tested Direct Query and Direct Lake modes for performance benchmarking.
Implemented aggregated layers in Fabric Notebooks and applied business filters.
Documented performance comparisons to aid migration decisions.

Project 5: Cross-Region Azure Synapse & Databricks Migration

Client: Myntra
Role: Developer
Technologies: Azure Synapse Analytics, AZCopy, Azure Databricks, MySQL, Azure Data Lake

Description:
Migrated SQL data warehouses, storage accounts, Databricks workspaces, and MySQL databases from South India to Central India regions, ensuring integrity and performance parity.

Key Contributions:

Migrated Azure Synapse SQL databases, updating logins and configurations.
Executed storage migrations with AZCopy and validated post-migration data.
Migrated Hive to Delta tables, resolving schema and access issues.
Managed permission revocation/re-grant and replicated WLM settings.
Migrated Databricks clusters, libraries, and workspace configurations.
Handled credentials (SAS tokens, secrets, SPNs) for environment setup.

Project 6: Sales & Production Data Integration on Azure

Client: Kona Bikes
Role: Azure Data Engineer
Technologies: Azure Data Factory, Azure Databricks, PySpark, Azure SQL Database, ADLS

Description:
Built a scalable Azure-based data pipeline for ingesting structured/unstructured data from multiple sources into ADLS and Azure SQL Database for analytics and business reporting.

Key Contributions:

Developed ADF pipelines to ingest from Oracle, SQL Server, and flat files.
Built Databricks notebooks for standardization and transformation.
Implemented full/incremental loads, audit logging, and automated execution.
Applied data reconciliation to ensure quality and accuracy.
Tuned Spark configurations to optimize performance.

Project 7: On-Premises to Azure Cloud Data Ingestion

Client: Merrill Edge, USA
Role: Junior Data Engineer
Technologies: Azure Data Factory, SQL Server, Azure SQL Database, ADLS

Description:
Implemented on-premises to Azure cloud data ingestion pipelines using ADF to support analytics and reporting needs.

Key Contributions:

Set up Self-Hosted Integration Runtime for on-premises to cloud data movement.
Created linked services/datasets to connect sources and destinations.
Developed and tested ingestion pipelines from SQL Server to ADLS/Azure SQL.
Automated ADF pipelines for scheduled refreshes.
Maintained audit logs and monitored pipelines.
Provided production support and minor data fixes.

Timeline

Senior Data Engineer

Sonata Software Limited

02.2023 - Current

Junior Data Engineer

Etios Consulting India Pvt Ltd

08.2019 - 01.2023

Bachelor of Technology - Electronics And Communication Engineering

Jawaharlal Nehru Technological UniversityAnantapur