Snowflake Certified with 3 years’ experience and Cloudera Hadoop Certified Professional with 15 years of IT experience from last 9 years I am working in Spark Snowflake Hadoop bigdata framework. Expertise in data migration from Data Lake to Snowflake and cloud. Expertise in data migration from traditional system to Data Lake. Having experience of leading a team of 10+ associate and doing planning, work allocation, tracking, reporting, weekly call with clients and inter team coordination. Experience in project management, service delivery management.
True
Data Ingestion and Distribution (DID), Morgan Stanley / Banking, This project is basically ingesting the data from multiple source system. Once data ingestion done will apply data curation logic using HQL/spark and curated data distributed to the downstream for consumption. We also distribute/migrate the data from data lake to snowflake and other Datawarehouse like Teradata., Hadoop/Spark/Snowflake Technical Lead, Creating high level and low-level design document, Data migration solution from data lake to snowflake, Written data migration framework data lake to Snowflake, Continuous data ingestion to snowflake using Snow pipe, Worked on POC for streaming data ingestion to Snowflake using Kafka, Creating Internal/external stages, virtual warehouse, and materialized view, Writing shell scripting code for file watcher functionality, Written data migration framework for Hadoop to snowflake migration, Extract Transform and Load Data from Azure Data Lake (ADLS) and processing the data in Azure Databricks, Created trigger for different sources and different pipelines like event based, scheduled trigger In Azure Data Factory, Writing custom ingestion script based on the ingestion use case, Writing data transformation and aggregation HQL scripts for data curation, Converting hive query into pyspark, Exposing the data to downstream team, Data migration script from Hadoop to Teradata, Data migration from Hadoop to snowflake, Writing shell script for validation and running Spark-submit, Creating TWS request for automation, Highly involved in batch optimization, Code deployment and code promotion to higher environment, Creating the JIRA task updating the JIRA on daily basis, Handling daily Scum call for providing the daily update to Client stack holders, Tivoli, Putty, Git Extension, Cloudera Manager, Git Bash, Team city, Snowflake, SnowSQL, ADF, Azure Cloud, Spark/Python, Hive, Impala, SQOOP, Shell script, Teradata, 01/2021, present Customer 720, Barclays / Banking, This project is basically divided into 2 parts, Customer 360 where various use cases are built around the internal data available with the bank. Second part (Customer 360 external) focuses on use cases of analysing the data gathered from external sources like property insurance company, retail company, etc. It has use cases like, top financial transactions, Property Widget, etc. Barclays Relationship Manager uses this application to view their customer network view., Hadoop Architect/Designer, Creating high level and low-level design document, Analyse and provide technical design for new use cases, Creating Graph database and loading the schema for Graph, Closely working with stack holders and BA’s for requirement gathering, Developing Spark application to process the data and generate Entity and Relation data file which can be then loaded to DSE Graph database, Writing shell script for validation and running Spark-submit, Loading data to DSE Graph database, Team handling and assigning the task to each team member, Creating the JIRA task updating the JIRA on daily basis, Handling daily Scum call for providing the daily update to Barclays Management, IntelliJ, Tivoli, Putty, WinScp, Cloudera Manager, Git Bash, Spark/Scala, Hive, SQOOP, Shell script, DSE Graph, MongoDB, Python, Gremline query, 10/2018, 12/2020 HSBC BABAR – Data Ingestion, Big Data / Data Lake, HSBC / Banking, HSBC has Lines of Businesses like GSC, RBWM, Risk, Payment, etc. across the world divided country wise over 7000 source system servers. It has large amount of data in various data sources like Relational Databases, Flat Files. In our project, we are ingesting and processing data and providing information to them for analytic purpose., Team Lead Big Data, Requirement gathering from Client involving the daily meeting with client, Import Data into the Hive from various Relational Databases (Oracle, Sybase, AS400, SQL Server) using Sqoop, Write Hive DDL to Create Hive Table for Optimize Query Performance, Ingest Flat Files like Delimited, Fixed Length, etc. into Hive Warehouse, Raise JIRA for Infrastructure, Platform issues, Engage with BA to understand the requirements clearly, Define all the possible Test Cases along with the Test Data, Comes up with missing scenarios and get them clarified with BA, Implement Data Validation, Quality Checks, Profiling, Involved in creating Hive tables, loading with data and writing Hive queries, Perform UAT of Big Data implementation, Control M, Putty, WinScp, HDFS, MapReduce, Shell Scripting, Hive, Sqoop, 11/2015, 09/2018