SRE|DevOps Engineer|Wipro @ Apple Cloud Technologies. Responsible for driving reliability and monitoring operations within the CloudTech org. Collaborate closely with ops teams to address P1/P2 customer and infrastructure issues, ensuring swift and effective resolutions. Handle on-call operations, perform root cause analysis (RCA) and implement continuous improvements to enhance system performance and reliability. Skilled in Prometheus and Grafana building dashboards to improve monitoring and observability.
Driving reliability and observability within Apple's Cloud Technologies organization. As part of the SRE team, I specialize in platform engineering, infrastructure automation, disaster-recovery management and cloud reliability ensuring high availability and seamless operations across mission-critical environments focusing on enhancing reliability and observability across infrastructure. With three years of hands-on experience in cloud computing, I have built expertise in managing AWS, GCP, Kubernetes, and Apple's internal cloud infrastructure(EKS-A) working at scale to optimize performance, reduce downtime, and improve system resilience. I manage on-call operations, ensuring swift resolution of incidents reducing customer impact while collaborating closely with ops. Strive towards maintaining SLO and reducing MTTR through proactive monitoring, alerting, and automation. My automation efforts with Terraform and scripting streamline deployments, while my work with Grafana dashboards, canary tests, and custom CRDs improves monitoring and observability, alongside automated runbooks for enhancing operational efficiency.
Key Contributions & Expertise:
- On-Call & Incident Management:
- Monitoring & Observability:
- Automation: