Summary

Overview

Work History

Education

Skills

Certification

Timeline

Earam Irfan

Bengaluru

Summary

Big Data Engineer with 5+ years of experience in building and optimizing large-scale data platforms using PySpark, Hadoop, Spark Streaming, and Kubernetes. Proven track record of improving pipeline efficiency by 50% and leading migrations from on-prem to cloud platforms, including AWS and Microsoft Fabric. Hands-on with containerization, CI/CD, and platform observability.

Overview

years of professional experience

Certification

Work History

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

Allstate

Bangalore

12.2024 - Current

Leading Spark job migration to Microsoft Fabric, evaluating and implementing OneLake, Lakehouse, Fabric Notebooks, and autoscale billing for cost-efficient execution.
Designed reusable Spark job templates and migration runbooks, accelerating onboarding and reducing integration time by 50%.
Built smoke tests to validate Spark jobs and pipelines in Fabric environments; handled incident triaging, debugging, and resolution across DPTs.
Integrated HashiCorp Vault within Fabric pipelines for secure secret handling and validated secrets management workflows.
Enabled and tested OneLake shortcuts and data flattening mechanisms to support cross-platform analytics in Fabric Lakehouses.

BIG DATA DEVELOPER

Allstate

Bangalore

02.2022 - 11.2024

Implemented ETL pipelines using PySpark to extract, transform, and load data from various sources into data warehouses, ensuring data consistency and accuracy
Migrated user applications from legacy Hadoop infrastructure to Kubernetes, improving performance, scalability, and operational efficiency by 50%
Developed and maintained onboarding processes by creating documentation, guidelines, and dashboards to support seamless user adoption of compute and storage platforms, leveraging ServiceNow incident data.
Managed and optimized platforms for over 3,000 big data users across diverse compute and storage environments, including on-prem S3, AWS, Dremio, and Hadoop.
Implemented CI/CD pipelines for building and pushing Spark base images to Artifactory in a Kubernetes environment, reducing manual intervention and speeding up deployments.
Worked on AWS EKS and S3 to conduct performance testing of Spark on Kubernetes versus AWS, ensuring both performance optimization and cost-efficiency.
Designed and executed smoke tests for CaaS, S3, Dremio, and Hadoop platforms, ensuring platform reliability and optimal configuration through Python-based automation.

BIG DATA ENGINEER

Accenture

Bangalore

08.2019 - 02.2022

Processed large volumes of semi-structured and structured data from sources like Business Insurance, Bond, Policy, and Salesforce, and loaded them into Hive tables using a Medallion architecture.
Wrote SQL queries to extract and transform data from CDC (Change Data Capture) tables and provided refined datasets to downstream teams.
Used Sqoop to load data from HDFS to SQL Server for reporting and downstream consumption.
Gained hands-on experience in Big Data technologies such as Hortonworks Hadoop, Hive, Sqoop, Spark, and HDFS.
Used data warehousing tools including Teradata, Oracle SQL, and MySQL for efficient data management.
Proficient in programming and scripting with Python, Apache Spark, Linux, and Unix Shell scripting.
Conducted end-to-end testing and developed ingestion flow pipelines.
Automated system jobs for project upgrades and decommissioning activities.
Participated in the migration of all Hadoop-based jobs to AWS cloud services.

Education

Executive PG - Business Analytics

LIBA

Chennai

11.2024

B.E - EEE

Dayananda Sagar College of Engineering

Bangalore

08.2019

Skills

Programming Language: Python, SQL, PySpark, Shell Scripting
Platform & Tools: PyCharm, Visual Studio Code, Jupyter Notebook, Azure Databricks, Git/Github, WinScp, Bash, CI/CD
Monitoring: Datadog, Splunk
Big Data & ETL: Apache Spark, Hadoop, Hive, Dremio, SQL, Spark streaming, Sqoop

Cloud & DevOps: AWS (S3, Athena, EKS), Microsoft Fabric, Jenkins, Docker, Kubernetes
Data analysis
ETL development

Certification

- Academy Accreditation - Generative AI Fundamentals- Databricks

- Google Cloud Skills Boost – Perform Foundational Data, ML, and AI Tasks in Google Cloud

- Google Cloud Skills Boost – Serverless Data Processing with Dataflow

- Use Apache Spark in Microsoft Fabric

- Use real-time intelligence in MS Fabric

Timeline

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

Allstate

12.2024 - Current

BIG DATA DEVELOPER

Allstate

02.2022 - 11.2024

BIG DATA ENGINEER

Accenture

08.2019 - 02.2022

Executive PG - Business Analytics

LIBA

B.E - EEE

Dayananda Sagar College of Engineering

Earam Irfan

Summary

Overview

Work History

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

BIG DATA DEVELOPER

BIG DATA ENGINEER

Education

Executive PG - Business Analytics

B.E - EEE

Skills

Certification

Timeline

CLOUD DATA PLATFORM ENGINEER (CONSULTANT)

BIG DATA DEVELOPER

BIG DATA ENGINEER

Executive PG - Business Analytics

B.E - EEE

Similar Profiles

Derek HendersonDerek Henderson

Seema PathakSeema Pathak

Dakota HorgerDakota Horger

LINSEY BOYINGTONLINSEY BOYINGTON

Aanchal AroraAanchal Arora