Summary
Overview
Work History
Education
Skills
Languages
Digitalcredentials
Timeline
Generic
Rajdeep Roy

Rajdeep Roy

Summary

Senior AWS Data Engineer experienced in working across the different stages of the data pipeline, including acquisition, integration, storage and data marts. Adept in working quickly and efficiently in close collaboration with analytic, engineering and other stakeholders.

Overview

11
11
years of professional experience

Work History

Data Engineer-Big Data

IBM
12.2014 - Current
  • Over the course of my career, I have accumulated 10 years of experience as a Data Engineer, with 7 years dedicated to working with Big Data technologies
  • My expertise includes Hadoop, Python, Spark, HDFS, AWS, Sqoop, and Hive
  • Hands-on experience in the implementation of building an enterprise data lake using IBM's Digital Insights framework
  • Responsible to design and implement robust ETL pipelines to facilitate seamless data transfer between GCP and AWS
  • Utilized AWS services such as AWS Glue, EMR, Step functions, Athena, S3 and Event Bridge to automate data transformation and loading processes
  • Established data validation to ensure data integrity during and after migration.

AWS Senior Data Engineer

Genentech
09.2023 - Current
  • Genentech, a member of the Roche Group, is a pioneering biotechnology company dedicated to pursuing groundbreaking science to discover and develop medicines for people with serious and life-threatening diseases
  • The migration and modernization of analytical data pipelines project is about Genentech's brand campaigns terabyte scale data migration from Hive and GCP to a modernized form into AWS data platform
  • The scope involves remodeling of data, rebuild of analytical data pipelines in AWS, storage into WS S3 data lake and integration into AWS cloud DW for BI use cases
  • The technology landscape includes - Amazon Redshift, Redshift Spectrum, AWS Glue, Athena, RDS, S3, Step Function, EMR, AWS Lambda, Amazon Event Bridge, PySpark, Hive, Google Big Query, Google Data Transfer, Google Analytics APIs, Google Storage
  • Contribution: Leading 3 critical modules to accomplish migration and modernization of Hive and GCP Big Query based Genentech Digital Insight analytical data pipelines into AWS Space
  • Worked on extracting data from the Google Search Console API and transitioned the AS-IS process from Google Cloud Platform (GCP) to Amazon Web Services (AWS)
  • Additionally, developed multiple accelerators to ensure timely project delivery.

AWS Data Engineer

MUFG Bank Ltd
08.2020 - 08.2023
  • Project name: Enterprise Data Platform
  • Building a cloud-based data lake using a framework that rapidly ingest and curate data using number of cloud services provided by AWS
  • Data can be consumed through multiple technologies within this architecture, with AWS' Simple Storage Service (S3) being the primary storage platform to leverage cloud object storage
  • Data is then consumed into compute and analytics engines like EMR, Redshift
  • Contribution: Worked as a Data Engineer for a US-based Banking Client for building a cloud-based data lake
  • Responsible to manage data coming from various sources and ingestion, and curation of those flat files
  • Written extensive Hive queries to transform the data according to the business requirements
  • Build and implement the framework for automation testing python scripts for Ingestion and Curation.

Big Data Engineer

Retail Client
04.2018 - 12.2019
  • Involved in developing an Enterprise data lake within IBM cloud infrastructure leveraging the Digital Insights framework to create market-leading analytics
  • Contribution: Responsible for all the deliverables and documentation while developing Data Lake pipelines in preparing data
  • Involved in development of the python scripts which are used to ingest the flat file extracts to the raw zone
  • Involved in data loading of historical and incremental files using the wrapper scripts
  • Implementation of metadata validation framework
  • Proactively identified the improvement areas and implemented automated solution for them
  • Created test scripts, test cases for Quality assurance team of EDL
  • Implemented security policies to access different Hive databases and HDFS locations
  • Documenting reports on various activities followed in the project
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Automated the DDL creation process in hive by mapping the MySQL data types.

Data Specialist-Informatica

Telecom Client
02.2014 - 03.2018
  • Involved in building and maintaining an application which extracts data from a legacy system, transforms it as per the business requirements and load to the Oracle database
  • Contribution: Understanding the user requirements, analyzing the mapping sheet to know the data flow from different sources and the type of transformations to be performed on the raw data
  • Worked in tuning of a long running job which reduced significant overall processing time of application
  • Writing script as a countermeasure for few frequent job failures.

Education

Bachelor of Technology in Electronics & Instrumentation -

Bengal Institute of Technology and Management
01.2012

Skills

  • ETL Data Pipelines Design & Development
  • Data Lake Development & Implementation
  • Programming Language (Python)
  • Data Quality Improvement
  • Big Data Technologies (Hadoop, Spark)
  • Data Cleaning & Preparation
  • Data Warehousing
  • SQL
  • Cloud Computing

Languages

English - Fluent
Hindi - Fluent
Bengali - Fluent

Digitalcredentials

  • AWS Certified Solutions Architect - Associate - 2023
  • Microsoft Certified: Azure Data Engineer Associate - 2022
  • Spark - Level 1 - 2020
  • Big Data Foundations - Level 2 - 2020
  • Digital Insights - Knowledge Delivery - 2019

Timeline

AWS Senior Data Engineer

Genentech
09.2023 - Current

AWS Data Engineer

MUFG Bank Ltd
08.2020 - 08.2023

Big Data Engineer

Retail Client
04.2018 - 12.2019

Data Engineer-Big Data

IBM
12.2014 - Current

Data Specialist-Informatica

Telecom Client
02.2014 - 03.2018

Bachelor of Technology in Electronics & Instrumentation -

Bengal Institute of Technology and Management
Rajdeep Roy