Results-driven Data Engineer with over 6+ years of expertise in building and optimizing large-scale data processing solutions. Skilled in Spark Scala, SQL, Hadoop, Hive, and Google Cloud Platform (GCP), I specialize in designing high-performance data pipelines for seamless ingestion, transformation, and analysis. My experience includes developing and optimizing ETL workflows using Spark Scala and SQL for massive datasets, building scalable data architectures on GCP to ensure efficiency and reliability, and orchestrating workflows with Apache Airflow for automation and seamless execution. With a strong focus on performance tuning and resource optimization, I enhance data processing speed while ensuring data integrity, security, and compliance with industry best practices. Collaborating with cross-functional teams, I have successfully improved data pipeline efficiency, reduced processing time, and automated workflows, leading to better decision-making and operational excellence. Passionate about delivering scalable, cost-effective, and high-performance data solutions, I continuously strive to drive innovation and business success through data.
Technologies: GCP (BigQuery, Dataflow, Cloud Storage, Cloud Pub/Sub), PySpark, Python, SQL
Technologies: PySpark, Python, SQL, Hive, Spark (Scala), GCP
Technologies: Apache Spark, Spark SQL, Scala, Hive, Shell Scripting, SQL