Summary
Overview
Work History
Education
Skills
Certification
Project
Timeline
Generic

ROHIT VATSA

Noida

Summary

Data Engineer with 3 years of experience in building and optimizing data pipelines using Azure services such as Azure Data Lake, Databricks, and PySpark. Strong expertise in data processing with Hadoop, BigQuery, and Azure, with hands-on experience managing ETL workflows. Successfully completed a Databricks project, leveraging it for large-scale data processing. Skilled in cloud-based data engineering and delivering scalable, efficient solutions.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Tata Consultancy Services
Noida
01.2022 - Current
  • Designed and Optimized ETL Pipelines: Developed and optimized end-to-end ETL pipelines using Hadoop, Hive, and Dataproc on GCP, resulting in a 30% reduction in data processing time and improved data accuracy.
  • Implemented Data Hydration Solutions: Led the implementation of data hydration solutions using Humana's proprietary frameworks (apaas, Data Prep), ensuring seamless data integration across multiple sources and enhancing data quality.
  • Enhanced Query Performance: Optimized complex SQL queries and data retrieval processes, reducing query execution time by 40% and improving the overall performance of data warehousing solutions.
  • Leveraged Big Data Technologies: Utilized big data technologies like HBase and PySpark on GCP and Azure to process and analyze large datasets, enabling faster insights and better decision-making for business stakeholders.

Education

BACHELOR'S DEGREE - ENGINEERING

Pune University
Pune, Maharashtra
07.2020

Skills

  • Languages: Python, SQL, Bash
  • Data Stores: HBase, Google Cloud Storage, Azure Blob Storage, Azure Data Lake Storage
  • Databases/Data Warehouses: BigQuery (GCP), MySQL, Azure SQL Database
  • Developer Tools: Git, Visual Studio Code, Streamsets, Databricks
  • Data Processing / Orchestration: Apache Hadoop, Apache Hive, PySpark, Dataproc (GCP)
  • Cloud Platforms: Microsoft Azure (Azure Databricks, Azure Data Lake), Google Cloud Platform (GCP)
  • ETL Frameworks: Humana’s proprietary frameworks (apaas, Data Prep)
  • Version Control: Git, GitHub
  • Data Integration & Pipeline Design: ETL pipeline development, data hydration, and optimization

Certification

  • Google Associate Cloud Engineer
  • Oracle Certified Foundations Associate
  • Microsoft Certified: Azure Fundamentals

Project

Formula 1 Data Pipeline Project

  • Overview: Developed a data pipeline using Azure services and Databricks to ingest and transform data from the Ergast API, a provider of historical Formula 1 race data.
  • Tools & Technologies: Azure Data Lake, Azure Databricks, PySpark, Ergast API.
  • Key Contributions:
  • Extracted data from the Ergast API and ingested it into Azure Data Lake.
  • Designed a scalable data pipeline using Databricks to transform raw data into meaningful insights.
  • Automated the pipeline for continuous ingestion and transformation of new data.
  • Ensured data quality and consistency through validation steps in the pipeline.

Timeline

Tata Consultancy Services
01.2022 - Current

BACHELOR'S DEGREE - ENGINEERING

Pune University
ROHIT VATSA