

SRE with hands-on experience supporting and debugging large-scale, distributed production systems in enterprise environments. Strong background in Linux, virtualization, and networking, with a focus on incident response, root cause analysis, and improving system reliability. Experienced in working closely with engineering teams to resolve complex issues, reduce operational risk, and make systems easier to operate at scale.
Worked on revamping an internal incident monitoring and ticket assignment dashboard used for tracking incident queues and engineer availability. Migrated the application from PHP to Django, improving maintainability and extensibility. Deployed the service on a Kubernetes environment to ensure higher availability, better scalability, and more reliable uptime. The system helps teams quickly identify available engineers and assign incident tickets efficiently based on real-time availability.