Summary

Overview

Work History

Education

Skills

Professional Summary

Personal Information

Timeline

Jayaseelan Joseph

Chennai

Summary

Having almost 7 years of experience in designing and developing Big Data applications using the Hadoop Ecosystem technologies (HDFS, Hive, Sqoop, Apache Spark and AWS). Experienced in optimizing Spark RDD performance by fine-tuning various configuration settings, including memory allocation, caching strategies, and serialization methods.

Overview

years of professional experience

Work History

Data Engineer

Capgemini Technology Services

Chennai

04.2022 - 07.2024

Implemented Spark transformations to process large-scale healthcare data for analysis
Deployed Spark applications in a distributed environment using YARN or Kubernetes
Managed Spark libraries and dependencies
Worked with Spark DataFrame APIs for structured data analysis
Implemented data validation and quality checks within Spark transformations
Integrated Spark with AWS Lambda for server less data processing solutions
Skilled in integrating Hive tables with other big data technologies, such as Hadoop, HBase, and Impala
Familiarity with Hive metastore and its role in managing table metadata and schema evolution
Proficient in optimizing Hive query performance by tuning various configuration settings, such as memory allocation, parallelism, and compression
Strong understanding of Hive integration with other big data technologies, such as Hadoop, Spark, and Impala, and their impact on query performance and resource utilization
Proficient in performing data validation and cleansing during data transfer using Sqoop's validation and cleansing options
Adept in scheduling and automating Sqoop jobs for incremental runs
Experienced in importing and exporting large datasets between Hadoop and relational databases using Sqoop
Proficient in writing Sqoop commands to transfer data between Hadoop and various databases such as MySQL, and SQL Server
Skilled in implementing Sqoop-based solutions for migrating data between different Hadoop distributions and versions
Strong experience in configuring Sqoop to handle complex data structures such as nested and hierarchical data

ETL Developer

Maersk Global service

Chennai

10.2017 - 04.2022

Experienced in ETL (Extract, Transform, Load) testing methodologies and processes
Proficient in testing data extraction processes from various sources, including databases, files, and APIs
Knowledgeable about data integration and consolidation processes in ETL pipelines
Familiarity with data quality and data cleansing techniques in ETL testing
Expertise in testing ETL workflows and job scheduling mechanisms
Expertise in testing ETL transformations, such as data aggregation, filtering, sorting, and joining
Knowledgeable about testing ETL processes in Big Data platforms, such as Hadoop or Spark
Experienced in conducting data validation and reconciliation between source and target systems in ETL testing
Strong understanding of data deduplication and data consolidation techniques in ETL testing
Expertise in testing data transformation rules and business logic applied during ETL processes
Proficient in using ETL testing tools and frameworks, such as QuerySurge, Talend Data Quality, or Informatica Data Validation Option

Education

Bachelor of Commerce -

Loyola College

03.2017

Skills

Apache Spark
Sqoop
Hadoop
Hive

Scala, Data Warehouse
Big query
Putty
AWS S3

Professional Summary

● Experienced in optimizing Spark RDD performance by fine-tuning various configuration settings, including memory allocation, caching strategies, and serialization methods.

● Expertise in using Spark RDD transformations and actions to process large-scale structured and unstructured data sets, including filtering, mapping, reducing, grouping, and aggregating data.

● Skilled in using Spark RDD persistency and caching mechanisms to reduce data processing overhead and improve query performance.

● Familiar with schema and data type operations, such as adding, renaming, and dropping columns, casting data types, and handling null values.

● Skilled in optimizing Spark SQL performance through memory allocation, caching, and serialization.

● Proficient in processing serialized data using Avro, Parquet, ORC, and Protobuf.

● Experienced with binary and textual data formats, such as CSV, JSON, and XML, including serialization and deserialization using Spark DataFrames and RDDs.

● Optimized Spark jobs and data workflows for scalability, performance, and cost efficiency using partitioning, compression, and caching.

● Performed data cleansing and preprocessing using Spark transformations.

● Implemented Spark SQL queries for data querying and aggregation.

● Proficient in setting up and customizing Google Dataproc clusters, including cluster resizing and configuration tuning.

● Experienced in ETL testing methodologies, including data extraction from various sources, workflow testing, job scheduling, and transformation testing.

● Executed data cleansing and preprocessing tasks using Spark transformations to prepare data for analysis and reporting.

Personal Information

Date of Birth : 16/11/1995
Gender : Male
Nationality : Indian
Marital Status : Married
Father's Name : A . Joseph

Timeline

Data Engineer

Capgemini Technology Services

04.2022 - 07.2024

ETL Developer

Maersk Global service

10.2017 - 04.2022

Bachelor of Commerce -

Loyola College

Jayaseelan Joseph

Summary

Overview

Work History

Data Engineer

ETL Developer

Education

Bachelor of Commerce -

Skills

Professional Summary

Personal Information

Timeline

Data Engineer

ETL Developer

Bachelor of Commerce -

Similar Profiles

Gouthamraju GudikandulaGouthamraju Gudikandula

Venkadesh Kumar SVenkadesh Kumar S

Surjith Shanmuga Rajan PSurjith Shanmuga Rajan P

Aditya V. DeshpandeAditya V. Deshpande

Vijay K. SVijay K. S