Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Anurag Gautam

Data Engineer
Moradabad

Summary

  • Result-driven Data Engineer with around 5 years of experience in the Data Engineering, Data Warehousing, ETL Platforms, Data Migration and Distributed Data processing, leveraging cloud services.
  • Designed and implemented robust data pipelines for click stream and third-party data, enabling real-time insights and informed decision-making.
  • Determine trends in sets of data accurately read data models and code and developed data rules from the analysis of the same.
  • Proficient in Python SQL, Databricks Pyspark, AWS DMS and Data Warehousing, Optimizations, BigQuery, Data Governance & Access Controls.

Overview

5
5
years of professional experience
2
2
Certifications

Work History

Data Engineer

Incred Finance
07.2023 - Current

Project: Incred Data Platform

  • Engineered a resilient data pipeline using AWS DMS , AWS SQS, Python & AWS Lambda handling over around 1 million records daily, Seamlessly ingested user transactions & third party data into delta lake using databricks.
  • Build a near real time data pipeline using databricks autoloader, improving data refresh rates from 1 hour batches to ~2-minute intervals significantly increasing data availability.
  • Automated hourly data pipeline for sourcing data from the third-party LeadSquared CRM, ensuring data accuracy with stringent data quality checks and metadata management
  • Migrated over 100+ TB of historical data, incremental jobs, views, job clusters from Hive Metastore to Databricks Unity Catalogue, enhancing data lineage and implementing improved access controls.
  • Developed Due Diligence reports for the finance team during audits and improving audit accuracy by 30% . Designed a dedicated data mart for finance team, enabling stakeholders and investors to make more informed decisions.
  • Implemented automated backfills and outage-recovery scripts to maintain data platform stability.
  • Contributed to cost optimization initiatives for efficient data platform operations.

Data Engineer

Affine
04.2021 - 07.2023

Project: Client - Zee 5

  • Built a PySpark pipeline to remove PII data from ~17 TB of historical datasets stored in AWS S3. Processed data at scale using Apache Spark and automated execution using Apache Airflow.
  • Built a PySpark ETL pipeline to ingest and process incremental OTT subscription data into staging and master tables.
  • Managed metadata with AWS Glue Data Catalog and enabled analytics using Hive and Amazon Athena.
  • Automated daily batch workflows using Apache Airflow with monitoring and failure recovery

Project: Client - Arrow Electronics

  • Supported Teradata to Azure Synapse migration with a focus on data extraction and validation. Ensured data accuracy through schema, count, and data quality checks.

Education

Master of Technology - Information Technology

Indian Institute of Information Technology, Allahabad
Prayagraj, India
04.2001 -

Bachelor of Technology - Information Technology

Rajkiya Engineering College
Banda, India
04.2001 -

Intermediate -

VKS Public School
Moradabad, India
04.2001 -

Marticulation -

VKS Public School
Moradabad, India
04.2001 -

Skills

ETL Development

Databricks

AWS (DMS, Lambda, S3)

Spark/Pyspark

Python

Databricks - Unity Catalog

Data Modeling

Data Warehousing

Big Data

Data Migration

Performance Tuning/Optimizations

Certification

Databricks Gen AI Engineer Associate

Timeline

Databricks Gen AI Engineer Associate

05-2025

Data Engineer

Incred Finance
07.2023 - Current

Azure - DP 900

01-2022

Data Engineer

Affine
04.2021 - 07.2023

Master of Technology - Information Technology

Indian Institute of Information Technology, Allahabad
04.2001 -

Bachelor of Technology - Information Technology

Rajkiya Engineering College
04.2001 -

Intermediate -

VKS Public School
04.2001 -

Marticulation -

VKS Public School
04.2001 -
Anurag GautamData Engineer