Dynamic and results-driven DevOps Engineer with over 3 years of hands-on experience at ADP, specializing in Kubernetes technologies. Proven expertise in orchestrating zero-downtime upgrades, driving continuous improvements in system reliability, and ensuring high availability across complex infrastructures. Good understanding of CI/CD and advanced deployment strategies (Blue/Green, Canary) to ensure seamless and efficient software releases, with minimal disruption. Skilled in Terraform for Infrastructure as Code (IaC), automating deployments, and managing scalable infrastructure. Strong collaborative skills, working cross-functionally to resolve complex issues, enhance operational workflows, and support Agile development teams. Passionate about leveraging cutting-edge technologies to drive business growth and system innovation.
• Provisioned and managed 40+ EKS clusters and 30+ OpenShift clusters, ensuring high availability and compatibility with AWS/RedHat updates through version upgrades (e.g., 1.25 to 1.31).
• Orchestrated deployments on Kubernetes clusters using Helm charts, Kustomize, and kubectl, ensuring high availability, scalability, and resource optimization.
• Enabled GitOps-driven deployments by integrating ArgoCD with Bitbucket repositories, ensuring Kubernetes cluster configurations were automatically synced, securely audited, and managed through version control.
• Led cluster upgrades with zero-downtime strategies by aligning with AWS deprecations and ensuring compatibility of add-on services.
• Possessed a good understanding of CI/CD processes and provided crucial support during microservice deployments, effectively troubleshooting and resolving any issues to ensure smooth and timely releases.
• Improved system reliability and reduced MTTR by implementing proactive (Dynatrace) and reactive (Splunk) alerting for enhanced observability, early issue detection and backup (Velero) solutions for disaster recovery
• Leveraged and implemented external-dns services to automate the registration and management of DNS records for dynamically provisioned Kubernetes resources (e.g., Services, Ingresses), ensuring seamless external accessibility, simplifying DNS management, and reducing manual configuration overhead and potential errors.
• Configured NGINX Ingress Controllers for path-based routing and load balancing of Kubernetes services, automating SSL cert management with cert-manager.
• Demonstrated a solid understanding of Terraform principles and practices, encompassing declarative infrastructure definition, state management best practices, and the use of modules for scalable infrastructure deployments.
• Integrated HashiCorp Vault within the cluster to securely manage and inject secrets into deployments, enhancing the overall security posture and streamlining secret management workflows.
• Enforced fine-grained access control using IAM roles (cloud) and Kubernetes RBAC (cluster), aligning with least-privilege and zero-trust principles.
• Practiced Site Reliability Engineering (SRE) by implementing error budgets, conducting root cause analysis, and automating recovery processes for improved system resilience.
• Collaborated with application teams to troubleshoot and resolve issues related to ingress, scheduling, and cluster resources, significantly enhancing problem-solving abilities across various Kubernetes components.
• Managed change management processes through ServiceNow to optimize change requests, approvals, and documentation, while using JIRA for sprint planning, progress tracking, and aligning team goals with Agile practices.
Cloud platforms: AWS (EKS, EC2, S3, VPC, IAM, SNS, ASG, ELB)
Containerization: Docker and Kubernetes
Infrastructure as code: Terraform
CI/CD tools: ArgoCD and Jenkins
Monitoring solutions: Dynatrace, Splunk, CloudWatch
Scripting languages: Python, Shell scripting
Troubleshooting expertise