Summary

Overview

Work History

Education

Skills

Languages

Personal Information

Timeline

Anuj Mahajan

Gurugram

Summary

Experienced Data Engineer at EY skilled in Python, Spark, and AWS technologies. Expertise in data migration, integration, and pipeline control. Proven track record of enhancing product functionality and streamlining CI/CD processes. Strong analytical skills and collaborative approach drive significant improvements in data quality and operational efficiency.

Overview

years of professional experience

Work History

Data Engineer

Gurugram

04.2023 - Current

Client: Macquarie Bank.
Worked as a part of the internal product team – Process Control Framework, which deals with creating the ingestion calculation and orchestration framework using Python, Pyspark, Hive, S3, and Apache Airflow.
Also worked on setting up CI/CD on JIRA, Bamboo for our product.
This product is a Python package that can be used across different product teams for setting up their orchestration, ingestion, and calculation frameworks just by configuring the YAML files.
Worked on creating configurable Data Quality Frameworks, which are used by different teams for their Data Quality Checks.
Worked on creating a Data Reconciliation Utility, which is used for regression analysis on a dataset post a new deployment.
This is used to detect changes in the actual data after the code changes.
Also coordinated with different teams on understanding their requirements and enhancing our product.

Data Engineer

Bengaluru

01.2022 - 12.2023

Client (Piramal Capital And Housing Finance Limited ):
Worked for data migration of on-premise PostgreSQL, MongoDB, and Salesforce databases to AWS in Snowflake tables. Worked on building a data lake.
Worked on creating an end-to-end solution by creating a data pipeline for streaming data from MongoDB and PostgreSQL to Snowflake and DynamoDB, and exposing the relevant columns to the API using Lambda and API gateway.
Worked on PySpark EMR and Kinesis to create a streaming application.
Have worked on Airflow to schedule batch jobs.
Also, created a Data Quality Framework using Python, Snowflake, and Airflow to check the quality of key data elements.

Data Engineer

Gurugram

01.2023 - 04.2023

Client: American Express.
Worked on Data migration from SAS to Hadoop ecosystem
Converted the SAS code to PySpark code.
Created a framework to calculate loss charge volume dynamically for all the modules based on the inputs supplied through the YAML file.

Data Engineer

Altimetrik

Chennai

08.2020 - 01.2022

Company Overview:
Client: Ford Motors,
Worked as a part of GDIA (Global Data Insights and Analytics) team, which ingests data from different sources into the Hadoop ecosystem for various downstream teams.
Also worked on various data ingestion tools like Attinuity (QLIK).
Worked on creating a Data Reconciliation framework using Pyspark to match the target data with the source at the end of the ETL pipeline after every run. It involved multiple sources like RDBMS, File, and API.
Worked for creation of Pyspark/shell scrips to automate day to day activities and further enhancing the process.

Software Engineer

Hexaware Technologies

Chennai

04.2018 - 08.2020

Company Overview: Client: (Citi Bank)
I was part of the data migration team, which aimed at creating a new data repository for the Payment Ruled Database.
It involves ingestion of data from MySQL source (existing database) into the Hadoop Ecosystem using Sqoop.
Worked in the data transformation team, applying business logic in HIVE tables using Spark SQL in Python language, as per the client's requirement.
Created a data pipeline to ingest real-time streaming data by integrating Kafka and Spark.

Education

BE - Electronics and Communication Engineering

Bansal Institute of Science and Technology

01.2017

12th -

St. Paul’s School

01.2013

10th -

St. Paul’s School

01.2011

Skills

Python
Scala
Shell Scripting
Hadoop
Spark
Hive
Sqoop
Apache - Oozie
AWS DMS
AWS Kinesis
Apache Airflow
AWS Lambda
AWS Glue

AWS DynamoDB
Snowflake
AWS API Gateway
MongoDB
HBase
JIRA-Bamboo
Data Migration
Data integration
NoSQL Databases
ETL development
Performance Tuning
Data Warehousing

Languages

English
Hindi

Personal Information

Date of Birth: 08/30/95
Gender: Male
Nationality: Indian

Timeline

Data Engineer

04.2023 - Current

Data Engineer

01.2023 - 04.2023

Data Engineer

01.2022 - 12.2023

Data Engineer

Altimetrik

08.2020 - 01.2022

Software Engineer

Hexaware Technologies

04.2018 - 08.2020

BE - Electronics and Communication Engineering

Bansal Institute of Science and Technology

12th -

St. Paul’s School

10th -

St. Paul’s School

Anuj Mahajan

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Software Engineer

Education

BE - Electronics and Communication Engineering

12th -

10th -

Skills

Languages

Personal Information

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Software Engineer

BE - Electronics and Communication Engineering

12th -

10th -

Similar Profiles

Shivam BhatShivam Bhat

PRADIPTA SURPRADIPTA SUR

Trupti D. ThakurTrupti D. Thakur

Brian MurrayBrian Murray

Swati UpadhyaySwati Upadhyay