

Lead Site Reliability Engineer with 9+ years of experience in cloud-native and containerized environments, specializing in Kubernetes and OpenShift platform operations. Strong expertise in ensuring high availability, reliability, and performance of production systems through proactive monitoring, incident management, and root cause analysis (RCA). Experienced in CI/CD automation using Jenkins and GitLab, Infrastructure as Code (Terraform, Ansible), and GitOps practices with ArgoCD to enhance deployment stability and reduce operational toil. Skilled in optimizing PostgreSQL databases, managing Linux-based environments, and maintaining SLA/SLO compliance across critical business applications.
Container & Orchestration: Kubernetes, OpenShift, Docker, Helm
Cloud Platforms: AWS (EC2, S3, VPC, EKS), Azure (AKS), OpenStack
DevOps: Terraform, Ansible, Bash, Python
CI/CD & GitOps: Jenkins, GitLab, GitHub, ArgoCD
Monitoring & Observability: Prometheus, Grafana
Storage: Ceph, LVM, GlusterFs, NFS, EMC storage
Networking: Haproxy, Keepalived, DNS, NFS
Operating System: Linux(RHEl,Centos,Ubuntu) and Windows