Having total of 2 years experience in building big data ecosystem. Hands-on expertise in spark – based ETL workflows, including data ingestion, trasformation, and aggregations for large-scale datasets. Maintained and Monitored spark clusters on AWS EMR, ensuring high availability and fault tolerance. Experienced in optimizing spark SQL performance by tuning various configuration settings such as, memory allocation, caching, serialization. Integarated AWS S3 with pyspark jobs to handle large datasets in a distributed environment. Managed ETL processes with pyspark running on AWS EMR, utilizing AWS S3 for storage. Expertise in developing and deploying serverless applications using google cloud functions, enabling cost-effective and scalable solutions. Familarity with google cloud storage buckets, object lifecycle policies, and access control mechanisms to ensure data availability and compliance. Created DAG templates to standardize job orchestration across multiple spark use cases. Expertise in testing ETL workflows and job scheduling mechanisms. Experienced in testing data integration and synchronization between different systems using ETL process. Skilled in writing efficient SQL queries for data extraction, cleansing and reorting across relational and distributed database. Proficient in python scripting for data manipulation, automation and integrated with big data farmework and APIs. Expereince deploying data solutions on cloud infrastructure including AWS S3, EC2, lambda and Azure data lake, ensuring high availability and performance. Knowledge of workflow orchestration tools like Apache airflow and version control system (GIT) for collaborative development.