Sr. Data Engineer with 6+ years of experience in driving the data-driven solutions to increase efficiency, accuracy and creating data solutions, and analyzing data structure and Size to deliver insights and implement action-oriented solutions to complex business problems. Extensive hands on experience in Big Data technologies Apache Spark,Apache NIFI, Hive, NoSQL(MongoDb,Cassandra,HBase) etc. Have good functional and technical knowledge . Experience of cloud technologies i.e AWS.
Regulatory Reporting Framework
Developed Regulatory Risk Reporting Framework for calculation of ECL for credit risk provisioning for CECL/IFRS data, maintaining and optimizing ETL/ELT pipelines in Python/Hive/PySpark feeding data into the cornerstone
Implementing automated BNC checks and Data quality controls to ensure smooth
PWC auditing and quality of data Ensured effective incident handling and resolution within SLA
Supported external audit by collecting, walking through necessary artifacts, performing live demo with auditors in a well-timed precise manner
Responsible for generating aggregated reports and Datasets for Data Science team
IDP AWS Platform
Developed a generic framework that provides single repository/data lake solution for 360 view of Customer Data.
Developed end-to-end infrastructure deployment /provisioning that
includes provisioning lambdas, step functions, VPC, security groups, IAM roles, etc.
Wrote business rules for spark job using Drool’s rule engine.
Worked on deployment module for deploying different lambdas definition on AWS like the creation of EMR cluster, inserting transform data into Redshift, populating DynamoDB table entries using S3 event notification on lambda, etc.
Development of configuration-based spark modules for performing read,write, transform, and configuring the flow of ETL pipeline using JSON configuration.
Orchestrations of spark Jobs to be submitted on Livy.
Wrote different step functions to orchestrate different lambdas and cloud watch event rules for scheduling step-functions timely.
Athena Replatforming Project
Worked towards data ingestion, transformation and cleansing in AWS using AWS S3, EMR, GLUE, ATHENA
Involved in technology migration from SAS to Python and Spark.
Creation of DVF scripts to validate SAS and Spark Datasets in Python
Implementation of clustered tables in Pyspark using Indexing and optimization of underlying datasets in Hive
Pyspark