Senior Data Engineer with 9+ years of experience designing and building scalable batch and streaming data pipelines using PySpark, Spark, Ab Initio, and Kafka on cloud platforms like Azure and AWS. Proven track record of automating data workflows, improving system reliability, and delivering multi-million-dollar business impact. Strong expertise in distributed data systems, CI/CD, and cross-functional collaboration.
Languages: Python, SQL, Bash, Java
Big Data: PySpark, Apache Spark, MapReduce, Hive, HDFS, Kafka
Workflow orchestration: Kubernetes, Docker, Concourse CI, Jenkins
Databases and storage: Cosmos DB, MySQL, Azure Data Lake Storage (ADLS), HDFS
Cloud platforms: AWS (S3, EMR), Azure, Kubernetes
ETL: Ab Initio
Databricks Certified Data Engineer Associate (in progress)