Summary
Overview
Work History
Education
Skills
Languages
Timeline
Generic

ALANKIT BAWEJA

Noida

Summary

  • 5+ years of relevant experience in IT industry using Big Data.
  • Having good understanding of ETL concepts and process.
  • Adaptive to Cloud Strategies based on AWS and Azure.
  • Experience of working with AWS services(Data Pipeline, Step function, S3, Lambda, Cloud Watch, EMR, Athena, Redshift)
  • Experience of working with AZURE services(Azure Data Factory, Azure Databricks, ADLS, BLOB Storage, Azure SQL Database)
  • Experience of processing structured and semi- structured data using Python and PySpark
  • Experience of writing and optimizing complex queries in SQL.
  • Strong understanding and hands-on experience with data warehousing concepts.
  • Experience in working on Agile & Scrum Methodologies.
  • Good Experience in analysis of business requirements and prioritizing the issues accordingly. Experience in understanding clear requirements from the customer.

Overview

6
6
years of professional experience

Work History

Senior Data Science Engineer

Dunnhumby
04.2022 - Current


Project: Unified Targeting & Measurement Framework (UTMF)


Project Description

UTMF is a customer engagement app designed to streamline the Customer Engagement (CE) process while managing the underlying ETL operations. It extracts data from various sources, applies business transformations based on inputs from the UTMF app, and delivers actionable insights via Power BI to enhance customer engagement strategies.

We leverage both Azure and GCP ecosystems:

  • GCP: Utilizing GCS for data storage and Dataproc for scalable data transformations.
  • Azure: Using ADF for data orchestration and Databricks for advanced analytics and transformations.


Roles and Responsibilities

  • Developed UTMF app using web tech stack. (ReactJS, Javascript etc)
  • Created PySpark framework for end to end processing.
  • Creating Data Pipelines and scheduling them using CRON scheduler.
  • Created Job Control Framework.
  • Created Pipeline Architecture.
  • Quality Checking.
  • Production Support.

Analyst

TheMathCompany
06.2021 - 03.2022


Project: Danone Ecommerce


Project Description

In Danone Ecommerce project, reports are being generated in PowerBI dashboard which helps the client to drive business decisions. We have used Azure Ecosytem and infrastrcuture to create our E2E pipelines . Our own custom code extracts the latest data from FTP and ingest it into Azure ADLS and we use Azure ADF in conjuction with Azure Databricks to perform necessary transformations and finally load the final data model into PowerBi to generate dashboard views.


Roles and Responsibilities

  • Created data processing pipelines on Azure Ecosystem.
  • Created PySpark framework for end to end processing.
  • Created Job Control Framework.
  • Created pipeline architecture.
  • Quality Checking.
  • Production Support

Technical Associate

Genpact
02.2019 - 09.2020


Project: Legg Mason Enterprise Data Management


Project Description

The Franklin Templeton Enterprise Data Management is data warehouse used by client to generate different types of reports to analyze their investment and risk. We are using AWS services and infrastructure to migrate the data from multiple sources like Salesforce, Wiser, Lipper, PAM, Adobe, FTP and TAOS. To ingest the data into S3 (staging) from different sources we are using SQOOP as an ingestion tool. Previously, we were using INFORMATICA CLOUD for the data ingestion. Before implementing our business logic on the data, we used to perform some quality checks and store data into S3. After completion of quality checks, we implement our business logic using Spark Scala framework and dump the final data into the data warehouse, AWS REDSHIFT.


Roles and Responsibilities

  • Created JSONs, contains business logic, written in SQL, to process data through SPARK framework.
  • Created and maintained a python framework for replicating salesforce objects to s3 using informatica cloud.
  • Developed AWS Data Pipeline json.
  • EMR creation for processing of different objects.
  • Creating Athena tables to view data in S3.
  • Writing shell script for calling different spark applications.
  • Provided L3 production support for 1 sprint.

Education

Masters of Computer application -

USICT (GGSIPU)
New Delhi, India
06.2019

Bsc(H) - Electronics

DDUC (DU)
New Delhi, India
06.2015

Skills

  • Apache Spark
  • Pyspark
  • Python
  • SQL
  • Data Warehousing
  • ETL
  • AWS
  • Azure
  • Azure Databricks
  • Azure Data Factory
  • AWS Redshift

Languages

English
Hindi

Timeline

Senior Data Science Engineer

Dunnhumby
04.2022 - Current

Analyst

TheMathCompany
06.2021 - 03.2022

Technical Associate

Genpact
02.2019 - 09.2020

Masters of Computer application -

USICT (GGSIPU)

Bsc(H) - Electronics

DDUC (DU)
ALANKIT BAWEJA