Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Work Preference
Timeline

Amaan Khan

Noida

Summary

Results-driven Data Engineer with 3.5+ years of experience in ETL development, optimization, and cloud-based data pipelines across AWS and Azure. Skilled in Python, SQL, and data orchestration, delivering efficient, automated, and scalable data solutions. Successfully optimized pipelines, automated ingestion workflows, and improved data accuracy across multiple enterprise applications. Passionate about building high-performance data infrastructure that enables real-time insights and business decisions.

Overview

1
1
Certification
4
4
years of professional experience

Work History

Data Engineer

Capgemini
10.2022 - Current
  • Engineered and optimized ETL pipelines across AWS (EC2, S3, Redshift), ensuring high-quality, standardized data movement between warehousing layers.
  • Reduced GA4 pipeline processing time from 15 hours to 30 minutes and cut operational costs from $14,000 to $3,000 across 3 enterprise applications by re-engineering data ingestion logic, optimizing SQL joins, and implementing parallel data load strategies.
  • Developed Python-based automation scripts to normalize Reltio data, including column reordering, special character handling, and schema alignment for consistent downstream processing.
  • Developed Python-based scripts to automate repetitive tasks, including creating a script that unzips files from an S3 bucket and stores them as CSV files on an EC2 instance, incorporating cloud-based data handling.
  • Managed workflow scheduling and orchestration via Apache Airflow and Control-M, ensuring reliable and timely execution of recurring data processes.
  • Built and automated data validation checks for campaign tracking and analytics consistency, reducing manual QA efforts by 70%.
  • Derived valuable insights from data analysis, contributing to informed decision-making across enterprise applications.
  • Client: Johnson & Johnson

Education

Master of Computer Applications - Computer Science

R.V. College of Engineering
08.2022

Bachelor of Computer Applications - Computer Science

I.P P.G College
07.2019

Skills

  • Python
  • SQL
  • Data Engineering
  • ETL Pipelines
  • Cloud Platforms (AWS, Azure)
  • Amazon EC2
  • S3
  • Azure Blob Storage
  • Apache Airflow
  • Azure Databricks
  • Apache Spark
  • PySpark
  • IDEE
  • AIIF
  • Control-M
  • Amazon Redshift
  • PostgreSQL
  • NoSQL (MongoDB)
  • MacOS
  • Windows
  • Linux/Unix

Certification

  • Databricks Certified Data Engineer Associate
  • Google Certified Professional Data Engineer
  • Associate Cloud Engineer Certification
  • Microsoft Certified Azure Fundamentals
  • Apache Airflow Fundamentals
  • Agile Project Management
  • Industry Certification Life Sciences - MedTech

Languages

Python
SQL

Work Preference

Work Type

Full Time

Timeline

Data Engineer - Capgemini
10.2022 - Current
I.P P.G College - Bachelor of Computer Applications, Computer Science
R.V. College of Engineering - Master of Computer Applications, Computer Science
Amaan Khan