Technocrat professional offering 11+ years of progressive experience in data warehousing, business intelligence, project management and big data using Hadoop Ecosystem, Informatica Power Center/ Power Exchange & Business Objects across banking domain.Sound cognizance of CI/CD using Jenkins, GIT, Maven and strong working knowledge of Hadoop ecosystems such as HDFS, Map Reduce, Hive, YARN, Oozie, Zookeeper, Flume, Sqoop, Scala, Spark & Pyspark. Adept at spark SQL to interpret structure data queries and implementing predictive algorithms using Spark MLlib libraries and R. Hands-on experience in software development lifecycle, requirement analysis, design & development. Conversant with Python, RDBMS like SQL Server 2005, DB2 and Mainframe Dataset and possess knowledge of Amazon Web Server (AWS) S3. Strong business consulting acumen, client-facing skills, with ability to leverage analytic models and approaches to solve business problems
· Extracting data from files & tables to cloud using palantir foundry for preparing RDD & Data frames using pyspark
. Converting the legacy application to pCloud using pyspark
· Executing Analytics using Spark & documenting user defined function in pyspark
· Developing various dimensional & fact tables as part of data transformation and managing data coming from different sources
· Extracting data from files & tables to HDFS using Sqoop & Flume and preparing RDD & Data frames using spark on the Hive
· Building various Yarn Jobs to load incremental data to Impala to capture the changed data
· Utilising Hive, Impala integration to reflect the CDC on Hive platform and using Spark/MLlib for Machine Learning Algorithms
· Executing Analytics using Spark & documenting user defined function in pyspark
· Developing various dimensional & fact tables as part of data transformation and managing data coming from different sources
· Providing support to Map Reduce Programs those are running on the cluster and loading data from UNIX file system to HDFS
· Handled a 298 node cluster and handled design, coding & testing of Bofa's Bigdata platform
· Ingested data from files & tables to HDFS using Flafka and prepared RDD & Data Franmes using spark on the Hive
· Built various Yarn Jobs to load the incremental data to Impala to capture the changed data
· Utilised to Hive, Impala integration to reflect the CDC on Hive platform and used Spark/MLlib for Machine Learning Algorithms
· Executed Analytics using Spark & documented user defined function in scala
· Built various dimensional & fact tables as part of data transformation
· Loaded and transformed large sets of structured, semi-structured & unstructured data
· Handled data from various sources and supported Map Reduce Programs those are running on the cluster
· Effectively loaded data from UNIX file systems to HDFS
· Analysed business requirements & discussed with the Onsite Counterpart and prepared Mapping Documents on the basis of Design & Business needs
· Involved in ETL Design, Coding, Unit Testing & System Integration Testing
· Optimized mappings, sessions & workflows to improve performance and examined Informatica Code built by the Team and validated Client Standards & performance issues
· Created Unit Test Case Document, System Integration Testing Document & executed unit testing, system integration testing to ensure integrity of the data
· Scheduled jobs using ESP scheduler tool and defined dependencies for effective load processing
Involved in analyzing the business requirements and had discussions with the Onsite Counterpart
Creating Mapping Documents based on the Design and Business needs ETL Design, Coding, Unit Testing and System Integration Testing
Optimizing the Mappings, Sessions & Workflows to improve the performance
Involved in preparing Unit Test Case Document, System Integration Testing Document and performing unit, testing, System Integration testing to ensure the integrity of the data
Involved in scheduling jobs using ESP scheduler tool and defining dependencies for effective load processing
Hadoop Ecosystem: HDFS, Hive 120, Spark 24 Oozie410, Flume 152, Sqoop14420, Hbase 120, Impala 290