
Experienced Data Engineer with 9.5 years of expertise in designing and developing robust, scalable, and efficient data pipelines and solutions. Proficient in managing large-scale data processing, integration, and transformation using tools such as Apache Spark, PySpark, Python, Databricks, and SQL Server. Skilled in cloud data platforms like AWS (S3, Glue, Athena) and GCP. Strong background in writing optimized SQL queries and overseeing data flows for analytical and business use cases. Passionate about constructing high-performance data systems to facilitate strategic business decisions.
Data Engineering & Big Data: Apache Spark, PySpark, Hadoop, HDFS and Databricks
Programming: Python
Databases: MS SQL Server, Hive
Cloud Platforms: AWS (S3, Glue, Athena), GCP ( Big query, Dataproc, Dataform)
Workflow Orchestration: AWS Glue Jobs, Oozie, Cron and Airflow
Tools & Environments: Databricks, Hortonworks, Jupyter Notebook and SQL Server 2012/2008
Operating Systems: Windows and Linux
Version Control & CI/CD: Git and Jenkins