Domain and Functional Expertise:
•Telecom , Retail and Marketing and Banking, Insurance
Technical Expertise:
•Big Data & Analytics: SQOOP, Spark Core, Spark SQL, HBase, Hive, Kafka , Elastic Search, Kibana, PySpark
•Cloud Computing: AWS, EMR, EC2, LAMBDA, DYNAMODB ,GLUE,API gateways,S3
•Enterprise Information Management: Data Architecture, Data Modeling, Data Warehouse, Data Quality, Data Integration and Data Governance, CDC
•Expertise in database programming- PL/SQL, Database tuning, and Query optimization.
►Developed a python spark based framework which will allow the end users to create the data pipelines just by creating the Metadata .
►Created a validation Framework to perform data quality checks and generate report for Count, Column to column check on sample data and Aggregate Checks(Min , Max and Null) .
►Created a Reporting Framework using ELK stack which gives users the ability to know the current status of Jobs running and validation reports in the form interactive visualizations.
►Worked as Tech lead for Spark Upgrade from 2.3.0 to 3.0
►Worked with apache spark team to increase the performance of data pipelines extraction from RDBMS. Able to load ,extract and validate the table with 2 billion records with 100 columns in under 2 hours.
►Worked with AWS team to setup exact services which we were using in EMR cluster on a EC2 Cluster. This saves lot of money for the firm since we used EC2 for our dev and UAT boxes.
►Created an CICD solution for deploying the code in dev , Uatand Prod by running in memory test case and black duck scan for the library scan with Jenkins.
Purpose of the Project: : Purpose of the project is to create a new analytics platform which can create the data pipelines on the go with user inputs from UI.
· Developed an end to end Big data analytic and reporting platform(OoBA) from scratch using Amazon Elastic search and custom Kibana.
· Developed a highly scalable ETL data process on AWS EMR to integrate data from multiple sources like Kafka, Elastic search, MongoDB, RDBMS(MySQL , oracle and PostgreSQL and Teradata ), S3, AVRO, JSON and external hive system .Also giving functionality on the UI to join these different sources on the go and finally load the data into data lake i.e. S3 or Elastic search.
· Implemented a Back end Data Quality module to improve and monitor the data quality of provided data sources using AWS Lambda which responds to events and automatically manages the underlying compute resources.
· Created a dev and UAT env by setting up the exact services which we were using in EMR cluster on a EC2 Cluster. This saves lot of money for the firm since we used EC2 for our dev and UAT boxes.
· Responsible for a smooth end to end development and deployment of data-driven products as highly robust REST services using Golang and Python(Pyspark ). These Rest Services were deployed on an EMR machine and gave the user’s ability to read the crunched data from data lake (s3 and Elastic search).
· Fulfilled all data engineer duties for new Analytics platform.
· Responsible for a smooth end to end deployment of data-driven products as highly robust REST services involving Golang as well as Python.
Python
Big Data
Spark
DataBricks
Cloud
SQL
Hadoop
Hive
Airflow
EKS
Glue
Elastic Search
Artificial Intelligence Implementation
Generative AI
AWS Certified Solutions Architect – Professional
•AWS Certified Solutions Architect – Professional
AWS Certified Solutions Architect – Professional