Experienced data engineer skilled in building integrations for SaaS management systems, resolving bugs, and managing data with MongoDB, GitHub, Prefect Orion, AWS S3, and VS Code. Proficient in using Prefect flow runs for pipeline synchronization and handling JIRA tickets for user data processing. Eager to apply my technical expertise to deliver reliable and efficient data solutions.
Built a virtual file system project to explore the fundamentals of file systems, offering practical insight into file operations such as creation, reading, writing, and deletion. Developed a user-friendly command-line interface for navigating and interacting with the virtual file system, enabling exploration of features like file statistics, listing, truncation, and closure.
Created a Hadoop cluster simulation using VirtualBox to store wearable data in HDFS, enabling distributed computation for handling large datasets efficiently. Developed a distributed machine learning model with Apache PySpark to predict diseases, leveraging parallel processing in the cluster and analyzing labeled datasets from UCI Berkeley using the Random Forest algorithm.
Implemented Flask and Docker-based Data-Lake file-sharing system, ensuring effective data management. Utilized PostgreSQL and MongoDB for secure data storage and user authentication to protect privacy and control access. Created a custom API to facilitate seamless collaboration and file exchange among different stakeholders.
GATE Exam score, 02/06/2022, 95.38 Percentile