Kaki Sai Kiran

Site Reliabilty Engineer

Bangalore

Summary

Site Reliability Engineer with 3 years of experience supporting frontend mobile and web application release management on AWS-based platforms, consistently maintaining ≥99.99% availability. Strong background in Linux, Kubernetes, CI/CD, and observability, with proven impact in zero critical post-release failures and crash rates below 0.05% through proactive monitoring and automation. Brings value by owning release readiness, incident response, and cross-team collaboration, embedding reliability, scalability, and performance into production systems.

Overview

years of professional experience

Certifications

Languages

Work History

Site Reliability Engineer

Aziro Technologies Pvt. Ltd.

02.2024 - Current

Operated and maintained highly available production systems on Linux and AWS, consistently maintaining ≥99.99% system availability across production environments.
Led end-to-end application release management for mobile and backend services, achieving zero production failures post-release through controlled rollouts and validation for both Android and iOS.
Built and enhanced monitoring and alerting using Prometheus, Grafana, and Firebase Crashlytics, maintaining crash rates below 0.05% and enabling proactive incident detection.
Conducted root cause analysis for production and release-related incidents, implementing permanent fixes to prevent recurrence and improve MTTR.
Automated operational and release workflows using CI/CD pipelines (Jenkins, GitLab CI) and custom scripts, reducing manual effort and improving release reliability.

Product Solution Engineer – I

Zeta

02.2022 - 04.2023

Provided 24×7 Level-1 production support for high-availability, distributed payment systems, helping maintain ≥99.9% service availability.
Monitored alerts and dashboards, owning the initial incident lifecycle (detection, triage, ticketing, and escalation) to ensure timely resolution.
Performed basic log analysis and SQL queries to identify common application and data issues, reducing investigation time for L2/L3 teams.
Coordinated with engineering and business teams during incidents, sharing impact analysis, status updates, and handoffs.
Supported production releases and change management activities, following SOPs to minimize risk and downtime.
Created and maintained runbooks, dashboards, and knowledge base documentation, improving on-call efficiency and faster issue resolution.

Education

Civil Engineering

JNTUA

Anantapur, India

04.2001 -

Skills

Site Reliability Engineering: Incident Management, RCA, Monitoring & Alerting, SLA/SLO Adherence, Reliability Improvements

Cloud Platforms: AWS (EC2, ECS, IAM, VPC, S3, RDS, Lambda, CloudWatch, Route53)

Accomplishments

Shining Star Award (JFM 2025) – Recognized for Individual Excellence and consistent high performance.
Ultimate Team Nomination (AMJ 2025) – Awarded for Team Collaboration and cross-functional contribution.
Shining Star Award (OND 2025) – Recognized again for Individual Excellence, demonstrating sustained impact and ownership.

Certification

DevOps Fundamentals – HeyDevOps

Timeline

GIT for Absolute Beginners – KodeKloud

09-2026

Linux for Absolute Beginners – KodeKloud

08-2025