Results-driven Senior Data Engineer with 7+ years of experience in big data engineering,
specializing in Spark, Scala, SQL, Hadoop, Hive, PySpark, and Google Cloud Platform (GCP).
Expert in ETL pipeline optimization, data migration, and real-time streaming using Kafka and
GCP Dataflow. Proven track record in performance tuning (reduced processing time by 35%),
ETL automation (90% automation), and cloud cost optimization (20% cost reduction).
Strong expertise in data modeling, query optimization, modular data transformation using DBT, and workflow orchestration using Apache Airflow.
✔ ETL Development, Data Pipeline Design, Data Integration
✔ Spark (Scala, PySpark), Hadoop, Hive, HDFS, MapReduce
✔ Kafka Streaming, Real-Time Processing, GCP Pub/Sub, Dataflow
✔ Cloud Platforms: AWS (Glue, EMR, Lambda, S3), GCP (BigQuery, Composer), Snowflake
✔ Modular Data Transformation using DBT (BigQuery & Snowflake)
✔ SQL Query Optimization, Data Modeling, Data Warehousing
✔ Workflow Orchestration: Apache Airflow
✔ Performance Tuning (Partitioning, Bucketing, Caching)
✔ Data Quality, Governance, IAM Roles, Monitoring
1. Data Platform Modernization & Migration
Technologies: Spark (Scala), GCP (BigQuery, Dataflow, Composer), SQL, Hive, Airflow, DBT, Kafka
2. Enterprise Customer Analytics Platform
Technologies: Spark (Scala), GCP (BigQuery, Dataflow, Airflow, Hive, Kafka, DBT)
3. Cloud Data Matrix
Technologies: AWS (Glue, EMR, Lambda, S3), PySpark, SQL, Airflow, DBT