
Site Reliability Engineer / DevOps Engineer with 6+ years of experience building, operating, and scaling reliable systems on AWS and Kubernetes. Expert in observability (Datadog, Prometheus, Grafana), incident response, performance tuning, and Infrastructure-as-Code with Terraform. Delivered multiple zero-downtime migrations (Redis.io, CloudAMQP) and engineered DR for EKS to uphold strict SLAs and improve MTTA/MTTR.
Cloud: AWS (EC2, EBS, EFS, ALB/ELB, S3, Route 53, VPC, CloudWatch, API Gateway, ElastiCache), CloudAMQP
IaC & CI/CD: Terraform, Ansible, Jenkins, Git, GitHub, Bitbucket
Containers & Orchestration: Docker, Kubernetes (EKS)
Observability: Datadog, Prometheus, Grafana, Site24x7, New Relic, Kibana, VictorOps (Splunk On-Call)