Data Engineer Intern
Real-Time Analytics System for User Activity.
- Built a real-time event pipeline using Kafka, AWS Lambda, and S3 to process and store user activity data with less than 10 seconds latency.
- Created Power BI dashboards for tracking engagement metrics and detecting behavioral anomalies in real time.
- Automated log ingestion, transformation, and partitioning for scalable, fault tolerant analytics.
End-to-End CI/CD Pipeline with Docker and EC2.
- Developed a Github Actions pipeline for automated build, test, and deployment of a Flask app on AWS EC2.
- Containerized the application with Docker, ensuring consistent and reproducible deployments.
- Optimized pipeline runtime by 40% using caching and parallel workflows, with rollback safety for reliability.