Summary

Overview

Work History

Education

Skills

Languages

Timeline

ALANKIT BAWEJA

Noida

Summary

5+ years of relevant experience in IT industry using Big Data.
Having good understanding of ETL concepts and process.
Adaptive to Cloud Strategies based on AWS and Azure.
Experience of working with AWS services(Data Pipeline, Step function, S3, Lambda, Cloud Watch, EMR, Athena, Redshift)
Experience of working with AZURE services(Azure Data Factory, Azure Databricks, ADLS, BLOB Storage, Azure SQL Database)
Experience of processing structured and semi- structured data using Python and PySpark
Experience of writing and optimizing complex queries in SQL.
Strong understanding and hands-on experience with data warehousing concepts.
Experience in working on Agile & Scrum Methodologies.
Good Experience in analysis of business requirements and prioritizing the issues accordingly. Experience in understanding clear requirements from the customer.

Overview

years of professional experience

Work History

Senior Data Science Engineer

Dunnhumby

04.2022 - Current

Project: Unified Targeting & Measurement Framework (UTMF)

Project Description

UTMF is a customer engagement app designed to streamline the Customer Engagement (CE) process while managing the underlying ETL operations. It extracts data from various sources, applies business transformations based on inputs from the UTMF app, and delivers actionable insights via Power BI to enhance customer engagement strategies.

We leverage both Azure and GCP ecosystems:

GCP: Utilizing GCS for data storage and Dataproc for scalable data transformations.
Azure: Using ADF for data orchestration and Databricks for advanced analytics and transformations.

Roles and Responsibilities

Developed UTMF app using web tech stack. (ReactJS, Javascript etc)
Created PySpark framework for end to end processing.
Creating Data Pipelines and scheduling them using CRON scheduler.
Created Job Control Framework.
Created Pipeline Architecture.
Quality Checking.
Production Support.

Analyst

TheMathCompany

06.2021 - 03.2022

Project: Danone Ecommerce

Project Description

In Danone Ecommerce project, reports are being generated in PowerBI dashboard which helps the client to drive business decisions. We have used Azure Ecosytem and infrastrcuture to create our E2E pipelines . Our own custom code extracts the latest data from FTP and ingest it into Azure ADLS and we use Azure ADF in conjuction with Azure Databricks to perform necessary transformations and finally load the final data model into PowerBi to generate dashboard views.

Roles and Responsibilities

Created data processing pipelines on Azure Ecosystem.
Created PySpark framework for end to end processing.
Created Job Control Framework.
Created pipeline architecture.
Quality Checking.
Production Support

Technical Associate

Genpact

02.2019 - 09.2020

Project: Legg Mason Enterprise Data Management

Project Description

The Franklin Templeton Enterprise Data Management is data warehouse used by client to generate different types of reports to analyze their investment and risk. We are using AWS services and infrastructure to migrate the data from multiple sources like Salesforce, Wiser, Lipper, PAM, Adobe, FTP and TAOS. To ingest the data into S3 (staging) from different sources we are using SQOOP as an ingestion tool. Previously, we were using INFORMATICA CLOUD for the data ingestion. Before implementing our business logic on the data, we used to perform some quality checks and store data into S3. After completion of quality checks, we implement our business logic using Spark Scala framework and dump the final data into the data warehouse, AWS REDSHIFT.

Roles and Responsibilities

Created JSONs, contains business logic, written in SQL, to process data through SPARK framework.
Created and maintained a python framework for replicating salesforce objects to s3 using informatica cloud.
Developed AWS Data Pipeline json.
EMR creation for processing of different objects.
Creating Athena tables to view data in S3.
Writing shell script for calling different spark applications.
Provided L3 production support for 1 sprint.

Education

Masters of Computer application -

USICT (GGSIPU)

New Delhi, India

06.2019

Bsc(H) - Electronics

DDUC (DU)

New Delhi, India

06.2015

Skills

Apache Spark
Pyspark
Python
SQL
Data Warehousing
ETL

AWS
Azure
Azure Databricks
Azure Data Factory
AWS Redshift

Languages

English

Hindi

Timeline

Senior Data Science Engineer

Dunnhumby

04.2022 - Current

Analyst

TheMathCompany

06.2021 - 03.2022

Technical Associate

Genpact

02.2019 - 09.2020

Masters of Computer application -

USICT (GGSIPU)

Bsc(H) - Electronics

DDUC (DU)

ALANKIT BAWEJA

Summary

Overview

Work History

Senior Data Science Engineer

Analyst

Technical Associate

Education

Masters of Computer application -

Bsc(H) - Electronics

Skills

Languages

Timeline

Senior Data Science Engineer

Analyst

Technical Associate

Masters of Computer application -

Bsc(H) - Electronics

Similar Profiles

Damoder ReddyDamoder Reddy

ANUJITA SINHAANUJITA SINHA

Suhas Simha VSuhas Simha V