Summary
Overview
Work History
Education
Skills
Timeline
Generic
Lakhan Kumar Mittapally

Lakhan Kumar Mittapally

Lead Site Reliability Engineer
Hyderabad

Summary

Lead Site Reliability Engineer with 9+ years of experience in cloud-native and containerized environments, specializing in Kubernetes and OpenShift platform operations. Strong expertise in ensuring high availability, reliability, and performance of production systems through proactive monitoring, incident management, and root cause analysis (RCA). Experienced in CI/CD automation using Jenkins and GitLab, Infrastructure as Code (Terraform, Ansible), and GitOps practices with ArgoCD to enhance deployment stability and reduce operational toil. Skilled in optimizing PostgreSQL databases, managing Linux-based environments, and maintaining SLA/SLO compliance across critical business applications.

Overview

10
10
years of professional experience
3
3
Languages

Work History

Lead Site Reliability Engineer

PAAS Operations
03.2024 - Current
  • Managing 100+ Kubernetes and OpenShift clusters across production and non-production environments supporting telecom and enterprise workloads
  • Provisioning and managing infrastructure using Terraform across OpenStack and AWS environments.
  • Automating Kubernetes deployments using AWX, Ansible, and shell scripting to reduce manual effort and configuration drift.
  • Implementing proactive monitoring and alerting frameworks using Prometheus and Grafana to detect issues early.
  • Leading Sev1/Sev2 incident management and structured problem management processes, reducing MTTR through improved alert tuning, and automated remediation scripts.
  • Performing cluster upgrades, patching, and security hardening to maintain platform stability and compliance.
  • Supporting sales POCs, including hardware estimation, solution validation, and delivery architecture planning.
  • Supporting internal product development (OCS – Online Charging System) as a DevOps/SRE engineer, ensuring platform reliability.
  • Improving CI/CD reliability through GitLab and Jenkins pipeline optimization.

Seinor Site Reliability Engineer

Telus Project
07.2020 - 02.2024
  • Designed and maintained a highly available Kubernetes infrastructure supporting telecom workloads.
  • Led incident response for production environments, implementing RCA, and preventive measures to minimize recurring outages.
  • Reduced MTTR by enhancing monitoring coverage, optimizing alert thresholds, and introducing structured incident response workflows.
  • Automated infrastructure provisioning using Terraform and Ansible to ensure consistent and repeatable deployments.
  • Managed Ceph-backed persistent storage for stateful applications, ensuring data durability and high availability.
  • Conducted performance tuning and capacity planning to maintain SLA/SLO compliance.
  • Participated in a 24/7 on-call rotation, ensuring rapid issue resolution in critical environments.

Lead DevOps / Site Reliability Engineer

SHAW Project
05.2018 - 06.2020
  • Managed OSS/BSS telecom applications deployed as microservices on Kubernetes clusters.
  • Implemented CI/CD pipelines using Jenkins and GitLab, integrated with Helm-based deployments.
  • Performed Kubernetes upgrades, patch management, and security hardening.
  • Implemented RBAC and network policies to strengthen the cluster security posture.
  • Designed disaster recovery strategies and validated backup/restore mechanisms for critical services.
  • Contributed to improving platform reliability by implementing proactive monitoring and operational best practices.

DevOps Engineer

NTT DoCoMo Project
03.2016 - 02.2018
  • Deployed and supported OpenShift clusters (HA and DR setups) for production environments.
  • Automated infrastructure setup using Ansible and GitLab CI/CD pipelines.
  • Implemented logging and monitoring stacks for platform observability.
  • Troubleshot production issues related to OpenShift, PostgreSQL, MongoDB, and storage.
  • Configured HAProxy and NGINX for load balancing and API gateway functionality.
  • Integrated GlusterFS dynamic storage provisioner for OpenShift workloads.

Education

Bachelor of Engineering Technology - Electronics And Communications Engineering

Jawaharlal Nehru Technological University
Hyderabad

Skills

Container & Orchestration: Kubernetes, OpenShift, Docker, Helm

Cloud Platforms: AWS (EC2, S3, VPC, EKS), Azure (AKS), OpenStack

DevOps: Terraform, Ansible, Bash, Python

CI/CD & GitOps: Jenkins, GitLab, GitHub, ArgoCD

Monitoring & Observability: Prometheus, Grafana

Storage: Ceph, LVM, GlusterFs, NFS, EMC storage

Networking: Haproxy, Keepalived, DNS, NFS

Operating System: Linux(RHEl,Centos,Ubuntu) and Windows

Timeline

Lead Site Reliability Engineer

PAAS Operations
03.2024 - Current

Seinor Site Reliability Engineer

Telus Project
07.2020 - 02.2024

Lead DevOps / Site Reliability Engineer

SHAW Project
05.2018 - 06.2020

DevOps Engineer

NTT DoCoMo Project
03.2016 - 02.2018

Bachelor of Engineering Technology - Electronics And Communications Engineering

Jawaharlal Nehru Technological University
Lakhan Kumar MittapallyLead Site Reliability Engineer