
Results-driven Big Data Engineer with 3.8 years of hands-on experience leveraging Hadoop, Spark, Hive, and associated ecosystem tools to process and manage large datasets efficiently. Strong expertise in Spark RDD transformations, preprocessing, caching, and performance optimization, along with a solid understanding of Spark's integration with Hadoop and Hive. Proficient in Hive schema evolution, SQL-style querying, and managing various serialized data formats such as Avro, Parquet, and ORC. Extensive experience using Sqoop for data movement between Hadoop and relational databases. Skilled in developing large-scale PySpark pipelines on AWS EMR while integrating Snowflake, Web APIs, and S3. Experienced in orchestrating workflows using Apache Airflow to enhance operational efficiency.