
Senior DevOps Engineer with 14+ years of experience specializing in Kubernetes-based infrastructure, automation, and cloud-native platforms across hybrid environments.
Strong expertise in managing containerized workloads, CI/CD pipelines, and large-scale infrastructure with a focus on reliability, security, and cost optimization. Experienced in Kubernetes (AKS, on-prem), Docker, Terraform, and observability tools like Prometheus and Grafana.
Proven track record in managing production systems, enabling automation-first practices, and collaborating with cross-functional teams to deliver scalable and resilient platforms.
• Architected and managed large-scale Puppet infrastructure supporting 10,000+ nodes across 8 global sites, ensuring high availability and consistent configuration management
• Acted as Puppet Subject Matter Expert, resolving complex catalog issues and validating fixes using RSpec, improving deployment reliability
• Managed 75+ repositories (internal & external) using r10k, ensuring standardized and scalable configuration delivery
• Built CI/CD validation pipelines using Jenkins (PDK, Puppet lint, automated testing), reducing production defects and improving code quality
• Led production change management activities including code reviews, staging-to-production promotions, and post-deployment impact analysis
• Implemented CIS hardening and compliance controls, contributing to successful ISO audit readiness and security posture improvements
• Managed Kubernetes workloads using Helm and Terraform, enabling scalable and repeatable deployments
• Designed and implemented Azure DevOps CI/CD pipelines with automated build, deploy, rollback, and health-check mechanisms
• Provisioned Azure infrastructure using Terraform with secure secret management practices
• Developed Azure-based solutions (Functions, Event Grid) to modernize legacy workflows and improve automation efficiency
• Led and mentored a team of 6 engineers, owning production approvals and improving team delivery efficiency
• Standardized deployment pipelines with security controls, monitoring, and cost governance
• Led onboarding of applications (Exceed project) into Puppet ecosystem by defining automation and configuration management strategies.
• Developed and managed Puppet manifests for multiple environments in private cloud infrastructure, enabling consistent application configuration and testing
• Automated application and OS upgrade workflows using Terraform and Puppet, improving deployment efficiency and reducing manual intervention
• Built CI/CD pipelines for Kafka and Packrat applications, ensuring reliable deployments with post-deployment health checks via LB/GSLB
• Containerized applications using Docker and orchestrated deployments on Kubernetes clusters across multiple environments
• Managed Kubernetes clusters (on-prem), including upgrades, automation runbooks, and operational troubleshooting
• Implemented monitoring and observability using Prometheus, Grafana, ELK, and Sumologic, improving system visibility and incident response
• Managed secrets using HashiCorp Vault and handled SSL certificate lifecycle across clusters and applications
• Administered artifact repositories (Artifactory, Azure Artifacts) including upgrades and maintenance
• Collaborated with Dev, QA, and Release teams to ensure smooth deployments across Dev, QA, Preprod, and Production environments
• Troubleshot build and deployment issues, reducing downtime and improving release stability
• Implemented Azure migration strategy using Azure Migrate, optimizing infrastructure cost and improving scalability
• Automated role-based access and authentication workflows, improving team productivity and access management
• Utilized tools like Nagios, Check_MK, Opsgenie, and Catchpoint for monitoring and alerting
• Managed version control using Git and Bitbucket, resolving merge conflicts and maintaining code integrity
• Executed change management via JIRA aligned with ITIL processes, ensuring controlled and auditable deployments
• Managed Kubernetes clusters and containerized applications across multiple environments, focusing on performance tuning and operational stability
• Implemented monitoring and logging solutions using Prometheus, Grafana, and ELK stack for improved observability
• Automated deployment workflows and infrastructure provisioning using Terraform and CI/CD pipelines
• Supported production systems with troubleshooting, incident response, and performance optimization
• Managed patching schedules for Solaris, RHEL, and Ubuntu environments, ensuring system stability, security compliance, and minimal downtime
• Administered user authentication and profile management using LIKEWISE, supporting centralized access control across Linux systems
• Deployed and managed applications in Azure environments using PowerShell automation, improving operational efficiency and consistency
• Developed and maintained Puppet manifests for configuration management, with infrastructure monitoring via Foreman dashboard to ensure system health and compliance
• Built and managed Docker containers and Dockerfiles, enabling application portability and streamlined deployment processes
• Strong administration experience in Linux environments (RHEL, Ubuntu, SUSE)
• Experience supporting Windows environments (basic administration and troubleshooting)
• Managed patching schedules and system maintenance ensuring security and compliance
• Executed change management processes through JIRA aligned with ITIL standards, ensuring controlled, auditable, and risk-free production deployments
• Automated application testing and result validation workflows using Jenkins, improving release readiness and reducing manual effort for deployment teams
• Managed patching and maintenance schedules for Solaris, Linux, and KVM environments, ensuring system stability and compliance with organizational SLAs
• Installed, configured, and maintained Blade servers and legacy SUN Solaris SPARC systems, supporting critical datacenter operations
• Executed change management processes via JIRA aligned with ITIL standards, ensuring controlled and auditable production deployments
• Provided end-to-end datacenter support including infrastructure setup, monitoring, and operational troubleshooting, contributing to high system availability
🔗 LinkedIn: linkedin.com/in/sunmuppala
• Experience working with monitoring and logging platforms similar to OpenSearch/ELK stack
• Exposure to managing application data and infrastructure components in distributed environments
• Familiar with handling secrets, certificates, and secure communication (SSL/TLS, Vault)
• Experience supporting backend applications and collaborating with engineering teams for performance tuning