Data Engineer with 3 years of experience designing and optimizing data pipelines using PySpark, Hive, Hadoop, MySQL, and cloud platforms like GCP and Azure. Skilled in handling diverse data sources such as CSV, JSON, relational databases, distributed file systems, and cloud storage. Expertise in building efficient ETL workflows, data validation processes, and scalable data architectures.
Data Processing & Tools: PySpark Hive Hadoop Spark SQL Airflow Kafka
undefined