● Have a hands on experience in ADA sparkola and COLLIBRA and CDSW
● Experience processing large amount of structured data.
● Good experience in ADA sparkola and collibra
Project Migration in Banking Domain
Environment:- Pyspark, sql, python, sas, sparkola,COLLIBRA
● Collecting data from client's sas location.
● In ADA we have 3 environment UAT, QA and PROD.
● In sparkola there is three layer Pipeline, inputs and output
● First we select inputs from COLLIBRA
● Build logic or transformations in piline like renaming columns and removing nulls
● Then select output from COLLIBRA
● After commit and build, Will get result in artifact and job server
● After matching counts send that output into QA environment
● After checking and matching counts in QA send that into PROD
● And in prod we generate RECON REPORT.
● Overall 3+ years of professional experience in different domains in the IT industry and mainly in Big
Data Development, PySpark and Cloud service (Amazon Web Services).
● Have a hands-on experience on Amazon web services mainly S3, EC2, EMR, RDS, Glue, IAM, CloudWatch,
Lambda, Redshift, Athena.
● Experience processing large amounts of structured and unstructured data, including integrating data
from multiple sources.
● Hands on experience with Big Data core components and Eco System including Data Ingestion and
Data Processing (PySpark, Hive, HDFS and MapReduce)
● Created an end-to-end pipeline that reads data, transforms it, and saves the result.
● Experience in data ingestion, transformation, and performance tuning.
● Good experience on Spark Architecture including Spark-Core, Spark-SQL.
● Created an end-to-end pipeline that reads data, transforms it, and saves the result. By using
PySpark on cloud premises.
Project #1: ETL in Marketing Domain
Environment: AWS Glue, AWS S3, PySpark, Redshift, Aws IAM
DESCRIPTION:
● Collecting data from client’s source bucket s3
● Gathering requirements for data models and then
● In S3 having Three layers like Raw, Cleansed and Proc layer
● According to requirement do some transformations and
● Writing ETL logic using AWS Glue and Lambda
● Building custom data pipelines based on business logic
● Evaluating, transforming and cleaning data sets
● And after Writing queries that deliver accurate results into Redshift
Project #2: Data Analysis in E-Commerce Domain
Environment: Data Migration Service (DMS), AWS Glue, AWS S3, Aws IAM, PySpark and Redshift.
DESCRIPTION:
● In this project we deal with structured data of E-commerce
domain. we get data into s3 bucket in raw format.
● This raw data we have to process and store in DMS with the help of glue jobs.
● Also, we have to create different types of glue jobs to get data as per client requirement.
● imported required datafrom source to target (DMS) location using glue job (ETL tool).
● Writing required logic to load as well as extract necessary data according to client requirement.
● Extracting csv according to client requirement at desired S3 location.
● Worked with various file format like parquet, csv.
● Involved in handling and monitoring glue job.