Summary
Overview
Work History
Education
Skills
Professional Summary
Personal Information
Timeline
Generic

Jayaseelan Joseph

Chennai

Summary

Having almost 7 years of experience in designing and developing Big Data applications using the Hadoop Ecosystem technologies (HDFS, Hive, Sqoop, Apache Spark and AWS). Experienced in optimizing Spark RDD performance by fine-tuning various configuration settings, including memory allocation, caching strategies, and serialization methods.

Overview

7
7
years of professional experience

Work History

Data Engineer

Capgemini Technology Services
Chennai
04.2022 - 07.2024
  • Implemented Spark transformations to process large-scale healthcare data for analysis
  • Deployed Spark applications in a distributed environment using YARN or Kubernetes
  • Managed Spark libraries and dependencies
  • Worked with Spark DataFrame APIs for structured data analysis
  • Implemented data validation and quality checks within Spark transformations
  • Integrated Spark with AWS Lambda for server less data processing solutions
  • Skilled in integrating Hive tables with other big data technologies, such as Hadoop, HBase, and Impala
  • Familiarity with Hive metastore and its role in managing table metadata and schema evolution
  • Proficient in optimizing Hive query performance by tuning various configuration settings, such as memory allocation, parallelism, and compression
  • Strong understanding of Hive integration with other big data technologies, such as Hadoop, Spark, and Impala, and their impact on query performance and resource utilization
  • Proficient in performing data validation and cleansing during data transfer using Sqoop's validation and cleansing options
  • Adept in scheduling and automating Sqoop jobs for incremental runs
  • Experienced in importing and exporting large datasets between Hadoop and relational databases using Sqoop
  • Proficient in writing Sqoop commands to transfer data between Hadoop and various databases such as MySQL, and SQL Server
  • Skilled in implementing Sqoop-based solutions for migrating data between different Hadoop distributions and versions
  • Strong experience in configuring Sqoop to handle complex data structures such as nested and hierarchical data

ETL Developer

Maersk Global service
Chennai
10.2017 - 04.2022
  • Experienced in ETL (Extract, Transform, Load) testing methodologies and processes
  • Proficient in testing data extraction processes from various sources, including databases, files, and APIs
  • Knowledgeable about data integration and consolidation processes in ETL pipelines
  • Familiarity with data quality and data cleansing techniques in ETL testing
  • Expertise in testing ETL workflows and job scheduling mechanisms
  • Expertise in testing ETL transformations, such as data aggregation, filtering, sorting, and joining
  • Knowledgeable about testing ETL processes in Big Data platforms, such as Hadoop or Spark
  • Experienced in conducting data validation and reconciliation between source and target systems in ETL testing
  • Strong understanding of data deduplication and data consolidation techniques in ETL testing
  • Expertise in testing data transformation rules and business logic applied during ETL processes
  • Proficient in using ETL testing tools and frameworks, such as QuerySurge, Talend Data Quality, or Informatica Data Validation Option

Education

Bachelor of Commerce -

Loyola College
03.2017

Skills

  • Apache Spark
  • Sqoop
  • Hadoop
  • Hive
  • Scala, Data Warehouse
  • Big query
  • Putty
  • AWS S3

Professional Summary

● Experienced in optimizing Spark RDD performance by fine-tuning various configuration settings, including memory allocation, caching strategies, and serialization methods.

● Expertise in using Spark RDD transformations and actions to process large-scale structured and unstructured data sets, including filtering, mapping, reducing, grouping, and aggregating data.

● Skilled in using Spark RDD persistency and caching mechanisms to reduce data processing overhead and improve query performance.

● Familiar with schema and data type operations, such as adding, renaming, and dropping columns, casting data types, and handling null values.

● Skilled in optimizing Spark SQL performance through memory allocation, caching, and serialization.

● Proficient in processing serialized data using Avro, Parquet, ORC, and Protobuf.

● Experienced with binary and textual data formats, such as CSV, JSON, and XML, including serialization and deserialization using Spark DataFrames and RDDs.

● Optimized Spark jobs and data workflows for scalability, performance, and cost efficiency using partitioning, compression, and caching.

● Performed data cleansing and preprocessing using Spark transformations.

● Implemented Spark SQL queries for data querying and aggregation.

● Proficient in setting up and customizing Google Dataproc clusters, including cluster resizing and configuration tuning.

● Experienced in ETL testing methodologies, including data extraction from various sources, workflow testing, job scheduling, and transformation testing.

● Executed data cleansing and preprocessing tasks using Spark transformations to prepare data for analysis and reporting.

Personal Information

  • Date of Birth : 16/11/1995
  • Gender : Male
  • Nationality : Indian
  • Marital Status : Married
  • Father's Name : A . Joseph

Timeline

Data Engineer

Capgemini Technology Services
04.2022 - 07.2024

ETL Developer

Maersk Global service
10.2017 - 04.2022

Bachelor of Commerce -

Loyola College
Jayaseelan Joseph