Senior AWS Data Engineer experienced in working across the different stages of the data pipeline, including acquisition, integration, storage and data marts. Adept in working quickly and efficiently in close collaboration with analytic, engineering and other stakeholders.
Overview
11
11
years of professional experience
Work History
Data Engineer-Big Data
IBM
12.2014 - Current
Over the course of my career, I have accumulated 10 years of experience as a Data Engineer, with 7 years dedicated to working with Big Data technologies
My expertise includes Hadoop, Python, Spark, HDFS, AWS, Sqoop, and Hive
Hands-on experience in the implementation of building an enterprise data lake using IBM's Digital Insights framework
Responsible to design and implement robust ETL pipelines to facilitate seamless data transfer between GCP and AWS
Utilized AWS services such as AWS Glue, EMR, Step functions, Athena, S3 and Event Bridge to automate data transformation and loading processes
Established data validation to ensure data integrity during and after migration.
AWS Senior Data Engineer
Genentech
09.2023 - Current
Genentech, a member of the Roche Group, is a pioneering biotechnology company dedicated to pursuing groundbreaking science to discover and develop medicines for people with serious and life-threatening diseases
The migration and modernization of analytical data pipelines project is about Genentech's brand campaigns terabyte scale data migration from Hive and GCP to a modernized form into AWS data platform
The scope involves remodeling of data, rebuild of analytical data pipelines in AWS, storage into WS S3 data lake and integration into AWS cloud DW for BI use cases
The technology landscape includes - Amazon Redshift, Redshift Spectrum, AWS Glue, Athena, RDS, S3, Step Function, EMR, AWS Lambda, Amazon Event Bridge, PySpark, Hive, Google Big Query, Google Data Transfer, Google Analytics APIs, Google Storage
Contribution: Leading 3 critical modules to accomplish migration and modernization of Hive and GCP Big Query based Genentech Digital Insight analytical data pipelines into AWS Space
Worked on extracting data from the Google Search Console API and transitioned the AS-IS process from Google Cloud Platform (GCP) to Amazon Web Services (AWS)
Additionally, developed multiple accelerators to ensure timely project delivery.
AWS Data Engineer
MUFG Bank Ltd
08.2020 - 08.2023
Project name: Enterprise Data Platform
Building a cloud-based data lake using a framework that rapidly ingest and curate data using number of cloud services provided by AWS
Data can be consumed through multiple technologies within this architecture, with AWS' Simple Storage Service (S3) being the primary storage platform to leverage cloud object storage
Data is then consumed into compute and analytics engines like EMR, Redshift
Contribution: Worked as a Data Engineer for a US-based Banking Client for building a cloud-based data lake
Responsible to manage data coming from various sources and ingestion, and curation of those flat files
Written extensive Hive queries to transform the data according to the business requirements
Build and implement the framework for automation testing python scripts for Ingestion and Curation.
Big Data Engineer
Retail Client
04.2018 - 12.2019
Involved in developing an Enterprise data lake within IBM cloud infrastructure leveraging the Digital Insights framework to create market-leading analytics
Contribution: Responsible for all the deliverables and documentation while developing Data Lake pipelines in preparing data
Involved in development of the python scripts which are used to ingest the flat file extracts to the raw zone
Involved in data loading of historical and incremental files using the wrapper scripts
Implementation of metadata validation framework
Proactively identified the improvement areas and implemented automated solution for them
Created test scripts, test cases for Quality assurance team of EDL
Implemented security policies to access different Hive databases and HDFS locations
Documenting reports on various activities followed in the project
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
Automated the DDL creation process in hive by mapping the MySQL data types.
Data Specialist-Informatica
Telecom Client
02.2014 - 03.2018
Involved in building and maintaining an application which extracts data from a legacy system, transforms it as per the business requirements and load to the Oracle database
Contribution: Understanding the user requirements, analyzing the mapping sheet to know the data flow from different sources and the type of transformations to be performed on the raw data
Worked in tuning of a long running job which reduced significant overall processing time of application
Writing script as a countermeasure for few frequent job failures.
Education
Bachelor of Technology in Electronics & Instrumentation -