Venkatesh Babu Menda

Innova Solutions

09.2020 - Current

Project: PVH.

Role: Data Engineer.

Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Azure Synapse, Log Analytics Workspace, and EventHub.

Duration: Sep 2022 – Present.

Roles and Responsibilities:

Extracted, transformed, and loaded data from source systems to Azure storage services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL in Azure Data Lake Analytics.
Ingested data into Azure Data Lake, Azure Storage, Azure SQL, and Azure DW; processed data in Azure Databricks.
Developed Spark applications using PySpark and Spark SQL to transform data from multiple file formats for analytical insights.
I wrote automation SQL scripts for pipeline orchestration and data validation.
Managed streaming data ingestion using Event Hub connection strings in Databricks.
Estimated cluster sizing, and monitored and troubleshot Spark Databricks clusters.
Applied the Spark DataFrame API for in-session data manipulation.
Demonstrated deep knowledge of Spark architecture, including Spark Core, SQL, Streaming, Executors, Tasks, and Deployment modes.
Implemented security and data governance policies using Databricks Unity Catalog.
Served on a technical committee for wastewater treatment system design initiatives.

Project: Optum.

Role: Associate.

Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Azure Data Lake, and Azure Blob Storage.

Duration: Sep 2020 – Aug 2022.

Roles and Responsibilities:

Provisioned Hadoop and Spark clusters to support an on-demand data warehouse, and enable data access for data scientists.
Built data pipelines and processed data in Azure Databricks using PySpark and Spark SQL.
Imported data from MySQL and other systems into Azure Data Lake and Azure Blob Storage.
Created tables and performed data validation using Spark SQL in Azure Databricks.
Loaded and transformed structured, semi-structured, and unstructured data for advanced analytics.
Cleaned and parsed data for ingestion into Azure Databricks environments.
Monitored system health, handled warning/failure logs, and optimized job execution.
Reviewed application logs within Databricks, and managed storage-level logging.
Managed ingestion pipelines from cloud storage (Azure Blob, ADLS) to Databricks.
Enabled downstream consumption of refined data by data scientists and analysts.

Similar Profiles