Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Earam Irfan

Earam Irfan

Bengaluru

Summary

Big Data Engineer with 5+ years of experience in building and optimizing large-scale data platforms using PySpark, Hadoop, Spark Streaming, and Kubernetes. Proven track record of improving pipeline efficiency by 50% and leading migrations from on-prem to cloud platforms, including AWS and Microsoft Fabric. Hands-on with containerization, CI/CD, and platform observability.

Overview

6
6
years of professional experience
1
1
Certification

Work History

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

Allstate
Bangalore
12.2024 - Current
  • Leading Spark job migration to Microsoft Fabric, evaluating and implementing OneLake, Lakehouse, Fabric Notebooks, and autoscale billing for cost-efficient execution.
  • Designed reusable Spark job templates and migration runbooks, accelerating onboarding and reducing integration time by 50%.
  • Built smoke tests to validate Spark jobs and pipelines in Fabric environments; handled incident triaging, debugging, and resolution across DPTs.
  • Integrated HashiCorp Vault within Fabric pipelines for secure secret handling and validated secrets management workflows.
  • Enabled and tested OneLake shortcuts and data flattening mechanisms to support cross-platform analytics in Fabric Lakehouses.

BIG DATA DEVELOPER

Allstate
Bangalore
02.2022 - 11.2024
  • Implemented ETL pipelines using PySpark to extract, transform, and load data from various sources into data warehouses, ensuring data consistency and accuracy
  • Migrated user applications from legacy Hadoop infrastructure to Kubernetes, improving performance, scalability, and operational efficiency by 50%
  • Developed and maintained onboarding processes by creating documentation, guidelines, and dashboards to support seamless user adoption of compute and storage platforms, leveraging ServiceNow incident data.
  • Managed and optimized platforms for over 3,000 big data users across diverse compute and storage environments, including on-prem S3, AWS, Dremio, and Hadoop.
  • Implemented CI/CD pipelines for building and pushing Spark base images to Artifactory in a Kubernetes environment, reducing manual intervention and speeding up deployments.
  • Worked on AWS EKS and S3 to conduct performance testing of Spark on Kubernetes versus AWS, ensuring both performance optimization and cost-efficiency.
  • Designed and executed smoke tests for CaaS, S3, Dremio, and Hadoop platforms, ensuring platform reliability and optimal configuration through Python-based automation.

BIG DATA ENGINEER

Accenture
Bangalore
08.2019 - 02.2022
  • Processed large volumes of semi-structured and structured data from sources like Business Insurance, Bond, Policy, and Salesforce, and loaded them into Hive tables using a Medallion architecture.
  • Wrote SQL queries to extract and transform data from CDC (Change Data Capture) tables and provided refined datasets to downstream teams.
  • Used Sqoop to load data from HDFS to SQL Server for reporting and downstream consumption.
  • Gained hands-on experience in Big Data technologies such as Hortonworks Hadoop, Hive, Sqoop, Spark, and HDFS.
  • Used data warehousing tools including Teradata, Oracle SQL, and MySQL for efficient data management.
  • Proficient in programming and scripting with Python, Apache Spark, Linux, and Unix Shell scripting.
  • Conducted end-to-end testing and developed ingestion flow pipelines.
  • Automated system jobs for project upgrades and decommissioning activities.
  • Participated in the migration of all Hadoop-based jobs to AWS cloud services.

Education

Executive PG - Business Analytics

LIBA
Chennai
11.2024

B.E - EEE

Dayananda Sagar College of Engineering
Bangalore
08.2019

Skills

  • Programming Language: Python, SQL, PySpark, Shell Scripting
  • Platform & Tools: PyCharm, Visual Studio Code, Jupyter Notebook, Azure Databricks, Git/Github, WinScp, Bash, CI/CD
  • Monitoring: Datadog, Splunk
  • Big Data & ETL: Apache Spark, Hadoop, Hive, Dremio, SQL, Spark streaming, Sqoop
  • Cloud & DevOps: AWS (S3, Athena, EKS), Microsoft Fabric, Jenkins, Docker, Kubernetes
  • Data analysis
  • ETL development

Certification

- Academy Accreditation - Generative AI Fundamentals- Databricks

- Google Cloud Skills Boost – Perform Foundational Data, ML, and AI Tasks in Google Cloud

- Google Cloud Skills Boost – Serverless Data Processing with Dataflow

- Use Apache Spark in Microsoft Fabric

- Use real-time intelligence in MS Fabric

Timeline

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

Allstate
12.2024 - Current

BIG DATA DEVELOPER

Allstate
02.2022 - 11.2024

BIG DATA ENGINEER

Accenture
08.2019 - 02.2022

Executive PG - Business Analytics

LIBA

B.E - EEE

Dayananda Sagar College of Engineering
Earam Irfan