Summary

Overview

Work History

Education

Skills

Languages

Digitalcredentials

Timeline

Rajdeep Roy

Summary

Senior AWS Data Engineer experienced in working across the different stages of the data pipeline, including acquisition, integration, storage and data marts. Adept in working quickly and efficiently in close collaboration with analytic, engineering and other stakeholders.

Overview

years of professional experience

Work History

Data Engineer-Big Data

IBM

12.2014 - Current

Over the course of my career, I have accumulated 10 years of experience as a Data Engineer, with 7 years dedicated to working with Big Data technologies
My expertise includes Hadoop, Python, Spark, HDFS, AWS, Sqoop, and Hive
Hands-on experience in the implementation of building an enterprise data lake using IBM's Digital Insights framework
Responsible to design and implement robust ETL pipelines to facilitate seamless data transfer between GCP and AWS
Utilized AWS services such as AWS Glue, EMR, Step functions, Athena, S3 and Event Bridge to automate data transformation and loading processes
Established data validation to ensure data integrity during and after migration.

AWS Senior Data Engineer

Genentech

09.2023 - Current

Genentech, a member of the Roche Group, is a pioneering biotechnology company dedicated to pursuing groundbreaking science to discover and develop medicines for people with serious and life-threatening diseases
The migration and modernization of analytical data pipelines project is about Genentech's brand campaigns terabyte scale data migration from Hive and GCP to a modernized form into AWS data platform
The scope involves remodeling of data, rebuild of analytical data pipelines in AWS, storage into WS S3 data lake and integration into AWS cloud DW for BI use cases
The technology landscape includes - Amazon Redshift, Redshift Spectrum, AWS Glue, Athena, RDS, S3, Step Function, EMR, AWS Lambda, Amazon Event Bridge, PySpark, Hive, Google Big Query, Google Data Transfer, Google Analytics APIs, Google Storage
Contribution: Leading 3 critical modules to accomplish migration and modernization of Hive and GCP Big Query based Genentech Digital Insight analytical data pipelines into AWS Space
Worked on extracting data from the Google Search Console API and transitioned the AS-IS process from Google Cloud Platform (GCP) to Amazon Web Services (AWS)
Additionally, developed multiple accelerators to ensure timely project delivery.

AWS Data Engineer

MUFG Bank Ltd

08.2020 - 08.2023

Project name: Enterprise Data Platform
Building a cloud-based data lake using a framework that rapidly ingest and curate data using number of cloud services provided by AWS
Data can be consumed through multiple technologies within this architecture, with AWS' Simple Storage Service (S3) being the primary storage platform to leverage cloud object storage
Data is then consumed into compute and analytics engines like EMR, Redshift
Contribution: Worked as a Data Engineer for a US-based Banking Client for building a cloud-based data lake
Responsible to manage data coming from various sources and ingestion, and curation of those flat files
Written extensive Hive queries to transform the data according to the business requirements
Build and implement the framework for automation testing python scripts for Ingestion and Curation.

Big Data Engineer

Retail Client

04.2018 - 12.2019

Involved in developing an Enterprise data lake within IBM cloud infrastructure leveraging the Digital Insights framework to create market-leading analytics
Contribution: Responsible for all the deliverables and documentation while developing Data Lake pipelines in preparing data
Involved in development of the python scripts which are used to ingest the flat file extracts to the raw zone
Involved in data loading of historical and incremental files using the wrapper scripts
Implementation of metadata validation framework
Proactively identified the improvement areas and implemented automated solution for them
Created test scripts, test cases for Quality assurance team of EDL
Implemented security policies to access different Hive databases and HDFS locations
Documenting reports on various activities followed in the project
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
Automated the DDL creation process in hive by mapping the MySQL data types.

Data Specialist-Informatica

Telecom Client

02.2014 - 03.2018

Involved in building and maintaining an application which extracts data from a legacy system, transforms it as per the business requirements and load to the Oracle database
Contribution: Understanding the user requirements, analyzing the mapping sheet to know the data flow from different sources and the type of transformations to be performed on the raw data
Worked in tuning of a long running job which reduced significant overall processing time of application
Writing script as a countermeasure for few frequent job failures.

Education

Bachelor of Technology in Electronics & Instrumentation -

Bengal Institute of Technology and Management

01.2012

Skills

ETL Data Pipelines Design & Development
Data Lake Development & Implementation
Programming Language (Python)
Data Quality Improvement

Big Data Technologies (Hadoop, Spark)
Data Cleaning & Preparation
Data Warehousing
SQL
Cloud Computing

Languages

English - Fluent

Hindi - Fluent

Bengali - Fluent

Digitalcredentials

AWS Certified Solutions Architect - Associate - 2023
Microsoft Certified: Azure Data Engineer Associate - 2022
Spark - Level 1 - 2020
Big Data Foundations - Level 2 - 2020
Digital Insights - Knowledge Delivery - 2019

Timeline

AWS Senior Data Engineer

Genentech

09.2023 - Current

AWS Data Engineer

MUFG Bank Ltd

08.2020 - 08.2023

Big Data Engineer

Retail Client

04.2018 - 12.2019

Data Engineer-Big Data

IBM

12.2014 - Current

Data Specialist-Informatica

Telecom Client

02.2014 - 03.2018

Bachelor of Technology in Electronics & Instrumentation -

Bengal Institute of Technology and Management

Rajdeep Roy

Summary

Overview

Work History

Data Engineer-Big Data

AWS Senior Data Engineer

AWS Data Engineer

Big Data Engineer

Data Specialist-Informatica

Education

Bachelor of Technology in Electronics & Instrumentation -

Skills

Languages

Digitalcredentials

Timeline

AWS Senior Data Engineer

AWS Data Engineer

Big Data Engineer

Data Engineer-Big Data

Data Specialist-Informatica

Bachelor of Technology in Electronics & Instrumentation -

Similar Profiles

MehulKumar ParmarMehulKumar Parmar

Pathika TatkondwarPathika Tatkondwar

Atharva JoshiAtharva Joshi

Jojo PriceJojo Price

Arun Kumar RajasekaranArun Kumar Rajasekaran