Summary
Overview
Work History
Education
Skills
Languages
Personal Information
Timeline
Generic
Anuj Mahajan

Anuj Mahajan

Gurugram

Summary

Experienced Data Engineer at EY skilled in Python, Spark, and AWS technologies. Expertise in data migration, integration, and pipeline control. Proven track record of enhancing product functionality and streamlining CI/CD processes. Strong analytical skills and collaborative approach drive significant improvements in data quality and operational efficiency.

Overview

7
7
years of professional experience

Work History

Data Engineer

EY
Gurugram
04.2023 - Current
  • Client: Macquarie Bank.
  • Worked as a part of the internal product team – Process Control Framework, which deals with creating the ingestion calculation and orchestration framework using Python, Pyspark, Hive, S3, and Apache Airflow.
  • Also worked on setting up CI/CD on JIRA, Bamboo for our product.
  • This product is a Python package that can be used across different product teams for setting up their orchestration, ingestion, and calculation frameworks just by configuring the YAML files.
  • Worked on creating configurable Data Quality Frameworks, which are used by different teams for their Data Quality Checks.
  • Worked on creating a Data Reconciliation Utility, which is used for regression analysis on a dataset post a new deployment.
  • This is used to detect changes in the actual data after the code changes.
  • Also coordinated with different teams on understanding their requirements and enhancing our product.

Data Engineer

EY
Bengaluru
01.2022 - 12.2023
  • Client (Piramal Capital And Housing Finance Limited ):
  • Worked for data migration of on-premise PostgreSQL, MongoDB, and Salesforce databases to AWS in Snowflake tables. Worked on building a data lake.
  • Worked on creating an end-to-end solution by creating a data pipeline for streaming data from MongoDB and PostgreSQL to Snowflake and DynamoDB, and exposing the relevant columns to the API using Lambda and API gateway.
  • Worked on PySpark EMR and Kinesis to create a streaming application.
  • Have worked on Airflow to schedule batch jobs.
  • Also, created a Data Quality Framework using Python, Snowflake, and Airflow to check the quality of key data elements.

Data Engineer

EY
Gurugram
01.2023 - 04.2023
  • Client: American Express.
  • Worked on Data migration from SAS to Hadoop ecosystem
  • Converted the SAS code to PySpark code.
  • Created a framework to calculate loss charge volume dynamically for all the modules based on the inputs supplied through the YAML file.

Data Engineer

Altimetrik
Chennai
08.2020 - 01.2022
  • Company Overview:
  • Client: Ford Motors,
  • Worked as a part of GDIA (Global Data Insights and Analytics) team, which ingests data from different sources into the Hadoop ecosystem for various downstream teams.
  • Also worked on various data ingestion tools like Attinuity (QLIK).
  • Worked on creating a Data Reconciliation framework using Pyspark to match the target data with the source at the end of the ETL pipeline after every run. It involved multiple sources like RDBMS, File, and API.
  • Worked for creation of Pyspark/shell scrips to automate day to day activities and further enhancing the process.

Software Engineer

Hexaware Technologies
Chennai
04.2018 - 08.2020
  • Company Overview: Client: (Citi Bank)
  • I was part of the data migration team, which aimed at creating a new data repository for the Payment Ruled Database.
  • It involves ingestion of data from MySQL source (existing database) into the Hadoop Ecosystem using Sqoop.
  • Worked in the data transformation team, applying business logic in HIVE tables using Spark SQL in Python language, as per the client's requirement.
  • Created a data pipeline to ingest real-time streaming data by integrating Kafka and Spark.

Education

BE - Electronics and Communication Engineering

Bansal Institute of Science and Technology
01.2017

12th -

St. Paul’s School
01.2013

10th -

St. Paul’s School
01.2011

Skills

  • Python
  • Scala
  • Shell Scripting
  • Hadoop
  • Spark
  • Hive
  • Sqoop
  • Apache - Oozie
  • AWS DMS
  • AWS Kinesis
  • Apache Airflow
  • AWS Lambda
  • AWS Glue
  • AWS DynamoDB
  • Snowflake
  • AWS API Gateway
  • MongoDB
  • HBase
  • JIRA-Bamboo
  • Data Migration
  • Data integration
  • NoSQL Databases
  • ETL development
  • Performance Tuning
  • Data Warehousing

Languages

  • English
  • Hindi

Personal Information

  • Date of Birth: 08/30/95
  • Gender: Male
  • Nationality: Indian

Timeline

Data Engineer

EY
04.2023 - Current

Data Engineer

EY
01.2023 - 04.2023

Data Engineer

EY
01.2022 - 12.2023

Data Engineer

Altimetrik
08.2020 - 01.2022

Software Engineer

Hexaware Technologies
04.2018 - 08.2020

BE - Electronics and Communication Engineering

Bansal Institute of Science and Technology

12th -

St. Paul’s School

10th -

St. Paul’s School
Anuj Mahajan