
Develop DataPipeline using Spark Streaming and Kafka
Develop a framework to ingest streaming data from kafka (confluent kafka) using spark-streaming. Fetched data from kafka, processed using spark-streaming and store it to a delta table.
Technologies: spark-streaming, kafka, Databricks, Airflow
Design and implement data masking framework
Develop a python framework to dynamically create authorized views in bigquery, based on column access level and current user access level. Using this a user can only see data(columns) that they have been authorized to.
Technologies: python, GCP Data Catalog, Gcp Bigquery
Develop Aws Lambda functions For various purposes
Develop code for analyzing facebook and youtube data for a media company using aws lambda (Serverless functions).
Technologies : Aws Lambda, DynamoDB, Python, Aws S3
Develop Framework Using Apache Beam and Google Cloud Dataflow.
Migrate the legacy hadoop map-reduce code to a framework using apache beam and google cloud dataflow to process data for a healthcare domain. Implement Auditing, DataQuality Checks, Data transformations within the framework.
Written code to orchestrate using airflow (GCP composer)
Technologies: Java, Apache Beam, GCP Dataflow, Airflow, Logging,Bigquery,Cassandra
Python
Open Source Data Processing Frameworks (Apache Beam, Apache Spark)
Apache Airflow
Sql
Cloud (Google Cloud And AWS)
Java
Scala
NoSql (Hbase and Cassandra)