Results-driven Senior Site Reliability Engineer (SRE) with 7+ years of experience in building and operating highly available, observable, and automated infrastructure across cloud and hybrid environments. Proven expertise in DevOps practices, Kubernetes, Linux, CI/CD pipelines, and monitoring tools like Datadog and Grafana.
Skilled in leading cross-functional initiatives to drive SLO-based reliability, develop automation platforms, and deliver secure, resilient systems. Architected an internal SLO-as-a-Service platform, led the development of centralized automation tooling (Autotron) using Go, Ansible, and Selenium, and currently leading a vulnerability remediation automation system integrating Qualys and LLMs.
Known for strong leadership, mentoring capabilities, and a hands-on approach to solving complex system challenges. Passionate about observability, automation, and continuous improvement, with a track record of scaling reliability practices across teams and platforms.
Cloud infrastructure ( AWS/Azure )
Site Reliability Engineering
Monitoring tools Admin
Infrastructure automation
DevOps practices
Continuous integration
Software development ( GoLang )