
Results-driven Site Reliability Engineer with over 16 years of IT experience, including more than 7 years dedicated to leading DevOps, Cloud Infrastructure, and SRE initiatives. Expertise in architecting highly available, scalable, and secure systems utilizing AWS and Kubernetes. Proficient in designing SLO-driven operations, automating incident response processes, and spearheading large-scale reliability programs across diverse global teams. Committed to enhancing system performance and driving operational excellence through innovative solutions and collaborative teamwork.
Spearheaded SRE transformation, introducing SLO-based operations, and reducing incident MTTR by 35%.
• Architected a secure AWS multi-account setup using private VPCs, VPC Endpoints, and EKS clusters.
• Built Terraform-based automation pipelines, improving infrastructure provisioning time by 60%.
• Implemented the Prometheus and Grafana monitoring stack, integrated with CloudWatch and Kiali, for full observability.
• Led a 6-member SRE/DevOps team; established on-call rotations, and automated runbook remediation.
• Implemented CI/CD pipelines using Jenkins and GitLab CI for microservices deployed on OpenShift and EKS.
• Developed IaC modules in Terraform Cloud, enabling consistent infra deployments across environments.
• Deployed centralized logging and alerting systems using CloudWatch, Fluentd, and Grafana.
• Introduced automated backup, scaling, and health-check systems, improving uptime by 25%.
• Enhanced Jenkins build pipelines with parameterized deployments and Groovy DSL.
• Supported global-scale EKS and EC2 deployments, ensuring HA, and DR readiness.
• Reduced infrastructure drift by introducing Terraform modules with CI-based validation.
• Partnered with SRE teams to implement postmortem reviews and reliability metrics tracking.
CI/CD implementation expertise
Proficient in AWS infrastructure components
Proficient in Kubernetes and Docker
Terraform and IaC experience
Continuous integration tools
Python programming
Azure cloud services
Container orchestration expertise
DataDog and Dynatrace experience
Effective team management
Best practices in security and compliance
Cross-functional teamwork