Summary
Overview
Work History
Education
Skills
Accomplishments
CORE COMPETENCIES
Projects
Timeline
Generic

Sudhakar Gajji

Hyderabad

Summary

Results-driven Senior Data Engineer with 7+ years of experience in big data engineering,
specializing in Spark, Scala, SQL, Hadoop, Hive, PySpark, and Google Cloud Platform (GCP).
Expert in ETL pipeline optimization, data migration, and real-time streaming using Kafka and
GCP Dataflow. Proven track record in performance tuning (reduced processing time by 35%),
ETL automation (90% automation), and cloud cost optimization (20% cost reduction).
Strong expertise in data modeling, query optimization, modular data transformation using DBT, and workflow orchestration using Apache Airflow.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Cybrowse digital solutions pvt Ltd
02.2022 - Current
  • Designed and maintained scalable data pipelines using Spark, SQL, Hadoop, and Hive.
  • Migrated on-prem Hadoop pipelines to GCP, reducing cloud costs by 20%.
  • Automated ETL workflows with Apache Airflow and modular SQL transformations using DBT, improving maintainability and auditability.
  • Tuned Spark jobs using partitioning, bucketing, and caching to improve processing by 35%.
  • Built real-time streaming pipelines with Kafka, GCP Pub/Sub & Dataflow.
  • Ensured data quality, governance, and security compliance across cloud platforms.
  • Collaborated with business stakeholders to deliver actionable insights.

Data Engineer

M3 Solutions Private Limited
05.2018 - 02.2022
  • Implemented scalable batch and streaming ETL pipelines on GCP.
  • Developed PySpark and Spark Scala jobs that improved processing time by 30%.
  • Migrated legacy DataStage workflows to Spark SQL and Snowflake.
  • Implemented DBT for building reusable, testable, and version-controlled SQL models.
  • Automated data pipelines using Apache Airflow and enhanced data observability.
  • Performed root cause analysis and ensured high data reliability.

Education

Bachelor's of Technology - Electrical, Electronics And Communications Engineering

Jawaharlal Nehru Technological University
Hyderabad
05-2015

Skills

  • Big Data Technologies: PySpark, Spark, Hive, HDFS
  • Data Integration & Storage: Sqoop, MapReduce, HBase, M7 Database
  • Databases & Cloud: MySQL, PostgreSQL, GCP (BigQuery, Dataflow, Composer, Cloud Storage), AWS (Glue, EMR, Lambda, S3, Step Functions), Snowflake
  • Programming & OS: Python, Scala, SQL, Windows, Linux
  • Data Modeling & Transformation: DBT (Data Build Tool) for version-controlled and modular SQL-based transformations
  • Orchestration & Monitoring: Apache Airflow, CloudWatch

Accomplishments

  • Reduced data processing time by 35% through advanced performance optimization.
    Successfully migrated on-premises data pipelines to Google Cloud Platform, improving scalability, and reducing costs by 20%
    •Automated over 90% of ETL processes, significantly reducing manual effort and
    enhancing operational efficiency.
    •Enhanced data security by implementing strict IAM roles and policies.

CORE COMPETENCIES

✔ ETL Development, Data Pipeline Design, Data Integration
✔ Spark (Scala, PySpark), Hadoop, Hive, HDFS, MapReduce
✔ Kafka Streaming, Real-Time Processing, GCP Pub/Sub, Dataflow
✔ Cloud Platforms: AWS (Glue, EMR, Lambda, S3), GCP (BigQuery, Composer), Snowflake
Modular Data Transformation using DBT (BigQuery & Snowflake)
✔ SQL Query Optimization, Data Modeling, Data Warehousing
✔ Workflow Orchestration: Apache Airflow
✔ Performance Tuning (Partitioning, Bucketing, Caching)
✔ Data Quality, Governance, IAM Roles, Monitoring

Projects

1. Data Platform Modernization & Migration
Technologies: Spark (Scala), GCP (BigQuery, Dataflow, Composer), SQL, Hive, Airflow, DBT, Kafka

  • Migrated Hadoop to GCP, modernized the architecture.
  • Integrated DBT for transformation layers in BigQuery, promoting reusability and lineage tracking.
  • Reduced ETL job run time by 35%.
  • Real-time analytics pipeline using Kafka + GCP Dataflow.

2. Enterprise Customer Analytics Platform
Technologies: Spark (Scala), GCP (BigQuery, Dataflow, Airflow, Hive, Kafka, DBT)

  • Designed a customer analytics platform with modular SQL models using DBT.
  • Achieved 25% faster BigQuery execution through SQL optimization and table partitioning.
  • Delivered datasets for ML and segmentation.

3. Cloud Data Matrix
Technologies: AWS (Glue, EMR, Lambda, S3), PySpark, SQL, Airflow, DBT

  • Migrated legacy ETL to AWS Glue and Snowflake.
  • Used DBT for Snowflake model development and dependency management.
  • Integrated DBT into CI/CD pipelines for deployment and testing.
  • Achieved 20% cost savings by tuning EMR jobs.

Timeline

Senior Data Engineer

Cybrowse digital solutions pvt Ltd
02.2022 - Current

Data Engineer

M3 Solutions Private Limited
05.2018 - 02.2022

Bachelor's of Technology - Electrical, Electronics And Communications Engineering

Jawaharlal Nehru Technological University
Sudhakar Gajji