Having almost 6+ years of experience in designing and developing Big Data applications using the Hadoop Ecosystem technologies (HDFS, Hive, Sqoop, Apache Spark and AWS).
Domain experience includes Finance,Insurance,Retail. Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
Understands the Complex Data Processing needs of big data and have experience in developing codes and modules to address those needs.
Capable of processing large sets of Structured and Semi-structured data.
Worked on AWS Components like S3, EMR.
Worked with different file formats like JSON, XML, AVRO data files and text files.
Worked extensively on Hadoop migration project and POCs.
Expertise in writing Hadoop Jobs for analyzing data using Hive.
Experience in importing and exportingdata using Sqoop from HDFS to RDBMS and vice-versa.
Knowledge in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Sqoop, and Spark.
Brought in simplification process and Optimization initiatives to bring efficiency into applications.
Proficient in optimizing Sqoop imports and exports for performance and scalability.
Experienced in designing and implementing complex data integration solutions using Sqoop.
Experience in handling hive schema evolution with avro file format .
Roficient in handling hive partitions and buckets with respect to the business requirement.
Skilled in handling semi structured/serialized data processing using hive (AVRO,PAQUET,ORC).
Experienced in efficiently using Hive managed and external table with respect to the business requirement. Deep knowledge in incremental imports, partitioning and bucketing concepts in Hive and Spark SQL needed for optimization.
Proficient in developing and implementing Spark RDD-based data processing workflows using Scala, or Python programming languages.
Experienced in optimizing Spark RDD performance by tuning various configuration settings, such as memory allocation, caching, and serialization.
Skilled in using Spark RDD persistency and caching mechanisms to reduce data processing overhead and improve query performance.
Familiarity with Spark RDD lineage and fault tolerance mechanisms and their impact on data processing reliability and performance.
Expertise in using Spark RDD transformations and actions to process large-scale structured and unstructured data sets, including filtering, mapping, reducing, grouping, and aggregating data.
Having hands on experience in deploying spark jobs over EMR cluster as a step execution.
Used Agile methodology to work with IT and business team to progress efficient system development.
Data base experience in SQL Server and MYSQL.
Have good problem solving and analytical skills and ready to innovate in order to perform better.
Have strong Interpersonal skills and communication skills.
Overview
7
7
years of professional experience
Work History
Big Data Developer
HDFC BANK LIMITED
08.2019 - Current
Is a consumer banking services company headquartered in Mumbai, India
The company offers products and services including wholesale banking ,retail banking ,treasury, auto loans ,two wheeler loans ,personal loans ,lifestyle loans ,consumer durable loans ,credit card
Along with this various digital products are Pazapp and SmartBuy
Performed Import and Export of data into HDFS and Hive using Sqoop and managed data within the environment
Created Hive tables, loaded data, and wrote Hive queries.
Managed Hadoop MapReduce jobs for processing large datasets
Was responsible for Optimizing Spark sql queries that helped in saving Cost to the project
Consuming Data from upstream system using WEBAPI, RDBMS, file systems and applying various business logic and transformation and write the data to target hive/HBase table which is further used by Business for Analytics purposes
Migration of huge amount of data from RDBMS to HDFS using Sqoop jobs
Automated Sqoop jobs using shell scripts to pull data from various data bases into Hadoop.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing jobs.
Familiarity with Spark RDD-based data processing libraries and frameworks, such as Apache Spark SQL, MLlib, and GraphX, and their features and limitations.
Experienced in optimizing Spark DataFrame performance by tuning various configuration settings, such as memory allocation, caching, and serialization.
Expertise in using Spark DataFrame transformations and actions to process large-scale structured and semi-structured data sets, including filtering, mapping, reducing, grouping, and aggregating data.
Cleansing and transformation of data using Spark pushing the crunched data to Hive table.
Created Glue tables on S3 buckets and loaded the data into the tables.
Skilled in using Spark DataFrame persistency and caching mechanisms to reduce data processing overhead and improve query performance.
Familiarity with Spark DataFrame-based data processing libraries and frameworks, such as Apache Spark SQL, MLlib, and GraphFrames, and their features and limitations.
Exported necessary spark Jars to run in the cluster.
Worked on generation of complex data generation which is further used by downstream application.
Involved in working on the Data Analysis, Data Quality and data profiling for handling the business that helped the Business team.
Loaded and transformed large sets of semi structured data likes XML,JSON,Avro,Parquet.
Code&peer review of assigned task Unit testing and Bug fixing
Jana Small Finance Bank is a consumer banking services company headquartered in Bangalore, India
This project defines mid-level platform design for Core Banking (CBS) with regards to the changes need to migrate Commercial & Offshore accounts from CAP to CBS
The Document is designed to complement the End to End Projects Design documents and Shows at a platform level the developing design in greater details
It serves as overview to the documents of applications software on the CBS and IBM Web Sphere Data Stage
Responsible for developing, support and maintenance for the ETL (Extract, Transform, Load) process using Informatics Power Center
Develop Mappings and Workflows to generate Staging files.
Develop various transformation like Source Qualifier , Sorter Transformation ,Joiner Transformation, Update strategy Lookup Transformation, Expression and Sequence Generator for loading data into target table.
Created multiple Mapplets Workflows, Tasks, database connection using Workflow Manager.
Created Session and batches to move data at specific intervals & on demand using Server Manager.
Responsibility include creating the session and scheduling the session.
Recovering the Failed Sessions and Batches.
Extracted the data Form Oracle ,DB2,CSV and Flat files.
Implemented performance tuning techniques by identifying and resolving the bottlenecks in source, target transformation mapping and session to improve performance. Understanding the functional requirements.
Designed the dimension model of the OLAP data marts
Preparing the documents for test data loading.
Education
MBA - Operations Management
SRM UNIVERSITY
2016
B.E - Computer Engineering
SATHAYABAMA UNIVERSITY
2014
12th -
SARASWATHI MATRIC HIGHER SECONDARY SCHOOL
SALEM
2010
10th -
SARASWATHI MATRIC HIGHER SECONDARY SCHOOL
SALEM
2008
Skills
Data Eco System: Hadoop,Sqoop, Hive, Apache Spark and AWS
Distribution : Cloudera 512
Databases : SQL Server, MySQL
Languages : Scala, Python, SQL
Operating Systems : Linux and Windows
Additional Information
WORK EXPERIENCE
HDFC BANK LIMITED – AUGUST- 2019 – Present
JANA SMALL FINANCE BANK – JULY 2016 – AUGUST- 2019