Summary
Overview
Work History
Education
Skills
Certification
Personal Project
Accomplishments
Timeline
Generic

Rithik Gandhari

Hyderabad

Summary

A seasoned Data Engineer boasting 3 years of expertise, adept at designing scalable data pipelines that process over 10TB. Enhanced data integration efficiency by 40% and reduced ETL processing time by 50% on GCP cloud platform

Overview

3
3
years of professional experience
4
4
Certificate

Work History

Data Engineer

Modak Analytics
Hyderabad
04.2021 - Current
  • Designed and implemented ETL pipelines using Java, Python, PySpark, and MS SQL for diverse data sources, ensuring end-to-end monitoring and robust error handling.
  • Pioneered Hadoop archiving, achieving a 50% reduction in storage space within data pipelines.
  • Optimized data pipelines to maintain 99% compliance with consumer (LLM) requirements.
  • Containerized all data pipelines using Docker and deployed them on Kubernetes, enhancing pipeline scalability.
  • Curated data pipelines using PySpark for a variety of sources (e.g., structured tables, CSV, JSON, XML, PDF).
  • Improved data pipeline efficiency by implementing multithreading and Spark optimization, reducing runtime from one week to one day.
  • Implemented Azure Data Factory pipeline for user inputs, routing them to Databricks notebooks for curation processes. Resulting metrics were loaded into Azure SQL Database, reducing manual intervention by 40%.
  • Enhanced data quality checks, resulting in a 90% increase in pipeline efficiency.
  • Decreased monitoring time by 50% through orchestration with Airflow.
  • Migrated data pipelines from on-premises to Google Cloud Platform, achieving a 50% reduction in runtime.
  • Developed a CDC mechanism to identify newly available files in GCS buckets and store metadata in BigQuery for use in data pipelines.
  • Performed data analysis using BigQuery to derive actionable insights and support data-driven decision-making.
  • Utilized Dataproc batches and GCP Batch services to execute data pipelines, achieving a 30% reduction in processing time.

Education

Bachelor of Technology - ECE

Mallareddy Engineering College
Hyderabad
04-2021

Skills

  • Languages: Java, Python, Shell Scripting, SQL, Data Structures and Algorithms
  • Relational Databases: PostgreSQL, MSSQL, BigQuery
  • CI & CD: Azure DevOps
  • Version Control: Git
  • Big Data Tools: Hadoop, Hive, Spark, Kafka
  • Cloud Platform: Microsoft Azure (ADF, ADLS, Azure SQL, Fundamentals), GCP data engineering services
  • Container Orchestration tools: Docker, Kubernetes
  • Operating System: Linux, Windows
  • Pipeline Orchestration tools: Airflow, Google Workflows

Certification

  • DP-900 - Microsoft Certified: Azure Data Fundamentals
  • Academy Accreditation - Databricks Lakehouse Fundamentals
  • Astronomer Certification for Apache Airflow Fundamentals
  • Astronomer Certification DAG Authoring for Apache Airflow

Personal Project

  • Developed a real-time telematics data streaming solution that efficiently loaded data points to AWS S3 and Delta Lake, utilizing Kafka and PySpark. This solution incorporated advanced algorithms to calculate driver scoring metrics, enabling insights that can improve driver performance.

Accomplishments

Received “GSK – Global Employee Recognition Award” from GlaxoSmithKline client for deliverables in GCP cloud migration

Timeline

Data Engineer

Modak Analytics
04.2021 - Current

Bachelor of Technology - ECE

Mallareddy Engineering College
Rithik Gandhari