- Architected and implemented multi-tenant EKS clusters for 50+ applications by developing reusable Terraform modules automating the provisioning of all required AWS resources.
- Designed comprehensive CI/CD pipelines for EKS deployments using Bitbucket, TeamCity, ArgoCD, and Helm, reducing deployment time from 4 hours to 45 minutes.
- Built automated ECS Fargate deployment pipeline leveraging Bitbucket, AWS CodePipeline, ECR for container registry, and ECS for container orchestration with automated scaling.
- Automated Docker image lifecycle management with secure promotion workflows and ECR registry policies.
- Established GitOps workflows with ArgoCD for automated Kubernetes application deployment across development, staging, and production environments.
- Created standardized Helm charts for 30+ microservices with automated manifest templating and versioning
- Integrated Wiz security scanning across both EKS and ECS pipelines for automated vulnerability assessment and security compliance.
- Implemented secure secrets management on EKS by integrating AWS Secrets Manager with Kubernetes via External Secrets Operator, enabling seamless sync to Kubernetes Secrets and environment variables with zero security incidents.
- Enhanced EKS cluster security and networking by implementing Cilium for fine-grained network policies and pod-level IP management, ensuring controlled traffic flow and improved observability.
- Implemented NGINX Ingress Controller on EKS to manage HTTP/HTTPS routing via Kubernetes Ingress resources, enabling path-based and host-based routing with a single load balancer for efficient service exposure.
- Configured Kubernetes Metrics Server on EKS to collect real-time CPU and memory usage metrics, enabling Horizontal Pod Autoscaler (HPA) to scale workloads dynamically based on resource utilization thresholds.
- Implemented Karpenter on EKS to replace Cluster Autoscaler, enabling faster and more cost-efficient node provisioning based on real-time pod requirements, improving scaling performance for dynamic workloads optimizing resource utilization and reducing costs by 25%.
- Optimized EKS storage and networking stack by configuring AWS EFS and EBS CSI Drivers for shared and dedicated persistent storage, while managing CoreDNS and Kube-Proxy to ensure reliable service discovery, internal routing, and load balancing across the cluster.
- Implemented Prometheus and Grafana on EKS to enable comprehensive cluster and application monitoring, providing real-time metrics visualization, alerting, and capacity planning dashboards for improved operational insights and reliability.
- Achieved 99.9% system uptime through comprehensive monitoring using Prometheus and Grafana for EKS clusters, and Splunk for centralized logging across both platforms.
- Configured CloudWatch monitoring and alerting for ECS infrastructure with Splunk integration for application-level observability.
- Troubleshot and resolved complex Kubernetes issues including pod failures, networking problems, and resource optimization using kubectl and log analysis
- Reduced mean time to resolution (MTTR) by 60% through automated monitoring, alerting, and incident response procedures.
- Implemented automated security scanning workflows with policy-based deployment gates and compliance reporting.
- Configured AWS CloudTrail and VPC FlowLogs for comprehensive audit trails and security monitoring.
Technologies: AWS (EKS, ECS Fargate, EC2, ECR, CodePipeline, S3, IAM, VPC, ALB, Route53), Terraform, Kubernetes, Docker, Helm, Bitbucket, TeamCity, ArgoCD, Python/Boto3, Wiz Security, Prometheus, Grafana, Splunk.