Summary
Overview
Work History
Education
Skills
Accomplishments
Projects
Certification
Timeline
Generic

Karan Bajaj

AGRA

Summary

Results-driven Data Engineer with experience at Enquero, adept in designing robust data pipelines and leveraging AWS Glue for ETL processes. Proficient in Python and PySpark, I excel in troubleshooting complex issues while collaborating effectively with clients. Committed to delivering valuable insights through data analysis and pipeline optimization.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

Enquero
12.2022 - Current
  • Designed and implemented robust data pipelines
  • Conducted data analysis to extract valuable insights
  • Troubleshoot and resolved pipeline failures
  • Actively Collaborated with clients on day to day activities and contributed to deployment activities
  • Involved in handling AD-Hoc requests to load and validate Retail ecommerce data
  • Involved in support and monitoring pipelines and provide RCA and solution to medium to complex problems

System Engineer

Tata Consultancy Services
06.2019 - 12.2022
  • Worked in DWH Team to write pyspark code using databricks platfom.
  • Debug background jobs and provide improved solutions for medium to complex problems.

Education

Bachelor's Degree - Computer Science and Technology

Jaypee University of Engineering And Technology
Guna, M.P
05.2019

Skills

  • Python
  • PySpark
  • SQL
  • Databricks
  • AWS
  • ETL
  • Airflow management
  • SQL and databases
  • AWS Glue
  • Data modeling

Accomplishments

  • Team Reward - INTEGRITY (Performance with exceptional integrity for outstanding contribution.
  • Peer to Peer award.

Projects

Big Data ETL Pipelines on AWS (Batch), Databricks, AWS Batch, MWAA, Athena, Redshift, 

  • Created ETL batch pipelines (SCD1) to ingest, transform and implement business logic as part of retail business domain.
  • Developed data processing code using Databricks, Py-Spark, and Spark SQL to load data into Delta Tables. Orchestrated the pipeline using Airflow.
  • Implemented DQ checks between source and transformed data and loaded data under delta tables which were eventually pushed to Redshift using COPY operation.

Source Data Migration Project, AWS S3, Databricks, Py-spark, Airflow

  • Analyzed existing source tables using older data and format.
  • Identified the need for migration due to changes in the AWS S3 location.
  • Compare and Performed validations on old vs new Data.
  • Created new tables to accommodate the updated data.
  • Enhanced the Py-Spark code to read and Process data from the new location in Parquet format.
  • Modified the Airflow DAG (Directed Acyclic Graph) to efficiently process the new data.

Pipeline Development For MongoDB to Redshift Integration 

  • Pyspark, Airflow, AWS, Databricks, MongoDB, Extract data from a specific MongoDB collection.
  • Build Ingress and Load Pipeline to ingress raw data and used Pyspark and SQL to transform the data.
  • Implement incremental updates or changes to the data by creating delta Tables and loaded cleansed data to redshift tables. Designed and implemented Airflow Directed Acyclic Graphs (DAGs) to automate the ETL process.
  • Scheduled data ingestion, transformation, and loading tasks.
  • Performed end-to-end testing to check proper functioning of pipeline.

Certification

  • Microsoft AZ 900 certification
  • Databricks Associate Data Engineer (Ongoing)

Timeline

Data Engineer

Enquero
12.2022 - Current

System Engineer

Tata Consultancy Services
06.2019 - 12.2022

Bachelor's Degree - Computer Science and Technology

Jaypee University of Engineering And Technology
Karan Bajaj