
Senior Data Engineer with 6+ years of experience in building and optimizing scalable data pipelines using PySpark and AWS (EMR, S3, Athena, Glue). Proven expertise in Big Data, ETL development, and data warehousing, with hands-on experience in Hadoop ecosystem technologies. Skilled in designing end-to-end data solutions, improving data processing efficiency, and handling large-scale datasets across multiple domains. Strong problem-solving abilities with a focus on delivering reliable, high-quality data solutions.
Programming & Query Languages: Python, SQL, PySpark, Shell Scripting
Big Data Technologies: Apache Spark (Core, SQL), Hadoop (HDFS, MapReduce), Hive, HBase
Cloud & AWS Services: AWS EMR, AWS Glue (Jobs, Crawlers), AWS Athena, Amazon S3, AWS Step Functions, EC2, RDS, DynamoDB
Data Engineering & Processing: ETL Pipeline Development, Data Pipeline Orchestration, Batch Processing, Data Transformation, Data Ingestion, Data Validation
Workflow Orchestration & Scheduling: Apache Airflow
Databases & Data Storage: Oracle, PostgreSQL, MySQL, DynamoDB, HBase
Performance Optimization: Spark Optimization (Partitioning, Caching, Broadcast Joins), Query Optimization
File Formats & Storage: Parquet, ORC, CSV, JSON (Snappy Compression)