Summary
Overview
Work History
Education
Skills
Languages
Timeline
Generic

Rajdeep Roy

Kolkata,WB

Summary

Senior AWS Data Engineer experienced in working across the different stages of the data pipeline, including acquisition, integration, storage and data marts. Adept in working quickly and efficiently in close collaboration with analytic, engineering and other stakeholders.

Overview

11
11
years of professional experience

Work History

AWS Senior Data Engineer

IBM
09.2023 - Current
  • The migration and modernization of analytical data pipelines project is about company's brand campaigns terabyte scale data migration from Hive and GCP to a modernized form into AWS data platform
  • The scope involves remodeling of data, rebuild of analytical data pipelines in AWS, storage into WS S3 data lake and integration into AWS cloud DW for BI use cases
  • The technology landscape includes - Amazon Redshift, Redshift Spectrum, AWS Glue, Athena, RDS, S3, Step Function, EMR, AWS Lambda, Amazon Event Bridge, PySpark, Hive, Google Big Query, Google Data Transfer, Google Analytics APIs, Google Storage
  • Contribution: Leading 3 critical modules to accomplish migration and modernization of Hive and GCP Big Query based Genentech Digital Insight analytical data pipelines into AWS Space
  • Worked on extracting data from the Google Search Console API and transitioned the AS-IS process from Google Cloud Platform (GCP) to Amazon Web Services (AWS)
  • Additionally, developed multiple accelerators to ensure timely project delivery.

AWS Data Engineer

IBM
08.2020 - 08.2023
  • Project name: Enterprise Data Platform
  • Building a cloud-based data lake using a framework that rapidly ingest and curate data using number of cloud services provided by AWS
  • Data can be consumed through multiple technologies within this architecture, with AWS' Simple Storage Service (S3) being the primary storage platform to leverage cloud object storage
  • Data is then consumed into compute and analytics engines like EMR, Redshift
  • Contribution: Worked as a Data Engineer for a US-based Banking Client for building a cloud-based data lake
  • Responsible to manage data coming from various sources and ingestion, and curation of those flat files
  • Written extensive Hive queries to transform the data according to the business requirements
  • Build and implement the framework for automation testing python scripts for Ingestion and Curation.

Big Data Engineer

Retail Client
04.2017 - 07.2019
  • Involved in developing an Enterprise data lake within IBM cloud infrastructure leveraging the Digital Insights framework to create market-leading analytics
  • Contribution: Responsible for all the deliverables and documentation while developing Data Lake pipelines in preparing data
  • Involved in development of the python scripts which are used to ingest the flat file extracts to the raw zone
  • Involved in data loading of historical and incremental files using the wrapper scripts
  • Implementation of metadata validation framework
  • Proactively identified the improvement areas and implemented automated solution for them
  • Created test scripts, test cases for Quality assurance team of EDL
  • Implemented security policies to access different Hive databases and HDFS locations
  • Documenting reports on various activities followed in the project
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Automated the DDL creation process in hive by mapping the MySQL data types.

Data Specialist-Informatica

Telecom Client
02.2014 - 03.2017
  • Involved in building and maintaining an application which extracts data from a legacy system, transforms it as per the business requirements and load to the Oracle database
  • Contribution: Understanding the user requirements, analyzing the mapping sheet to know the data flow from different sources and the type of transformations to be performed on the raw data
  • Worked in tuning of a long running job which reduced significant overall processing time of application
  • Writing script as a countermeasure for few frequent job failures.

Education

Bachelor of Technology in Electronics & Instrument

Bengal Institute of Technology And Management
West Bengal
07-2012

Skills

  • Big Data Technologies (Spark, Hadoop)
  • Programming Language (Python, SQL)
  • Datawarehouse (Hive, Redshift)
  • Orchestration (Control-M, Stepfunction, Airflow)
  • Version Control (GIT, Bitbucket)
  • ETL Tool (AWS Glue, Informatica)
  • AWS Services- Athena, RDS, S3, EMR, AWS Lambda, Amazon Event Bridge

Languages

English
Hindi
Bengali

Timeline

AWS Senior Data Engineer

IBM
09.2023 - Current

AWS Data Engineer

IBM
08.2020 - 08.2023

Big Data Engineer

Retail Client
04.2017 - 07.2019

Data Specialist-Informatica

Telecom Client
02.2014 - 03.2017

Bachelor of Technology in Electronics & Instrument

Bengal Institute of Technology And Management
Rajdeep Roy