Certified Data Scientist with a PG Diploma in Data Science and Two years of hands-on experience. Expertise in developing and deploying machine learning models, implementing robust data pipelines, and ensuring effective data monitoring. Proficient in utilizing analytical tools, statistical methods, and computing methodologies to derive actionable insights.
Achievement : Extracting Transactional data from MySQL RDS to HDFS, transforming the transactional Data according to the given target schema using PySpark. This Transformed data is to be loaded to Amazon S3 bucket, Creating Redshift tables and schema according to the given schema, Then loading the data from S3 bucket to amazon Redshift tables and performing analysis query.
Achievement : Reading the sales data from the kafka server, preprocessed the data to derive additional column also calculating Time based KPI's and Country and Time based KPI's and storing the KPI's in 10 minutes interval in a JSON format on HDFS for the further analysis.