Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Timeline
Generic

Akash Bhatia

New Delhi

Summary

Site Reliability Engineer / DevOps Engineer with 6+ years of experience building, operating, and scaling reliable systems on AWS and Kubernetes. Expert in observability (Datadog, Prometheus, Grafana), incident response, performance tuning, and Infrastructure-as-Code with Terraform. Delivered multiple zero-downtime migrations (Redis.io, CloudAMQP) and engineered DR for EKS to uphold strict SLAs and improve MTTA/MTTR.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Site Reliability Engineer / DevOps Engineer

Monotype Imaging
10.2021 - Current
  • Owned reliability for multiple high-traffic websites; defined SLAs/SLOs and implemented proactive, signal-rich alerting for 24×7 availability.
  • Designed DR architecture for EKS and surrounding services; authored runbooks and led periodic recovery drills.
  • Built and maintained Terraform modules to provision and scale AWS resources (EC2, EBS, EFS, ALB/ELB, API Gateway, CloudWatch, VPC).
  • Established end-to-end observability with Datadog, Site24x7, Prometheus, Grafana, and VictorOps; reduced noise and improved time-to-detect.
  • Led and mentored an 8-member L1 team; guided RCA creation and drove actions that improved MTTA and MTTR.
  • Investigated high-latency using APM; collaborated with developers on code-level fixes across microservices to improve throughput and p95/p99 latency.
  • Worked on creating an Internal Developer Portal to improve operational excellence, streamline workflows, and enhance developer productivity.
  • Developed centralized Grafana & Prometheus for pre-prod monitoring, reducing SaaS costs by replacing Datadog for non-prod environments.
  • Regularly contributed in architectural reviews from project inception, ensuring scalability, cost-efficiency, reliability, and high performance.
  • Leveraged AI-powered anomaly detection in observability platforms (Datadog, Prometheus) to proactively identify performance degradation before it impacted users.
  • Implemented AI-assisted log analysis for OpenSearch to accelerate root cause identification and reduce MTTR.
  • Partnered with QA, leveraging RUM insights to reproduce and resolve user-impacting issues; improved customer experience and stability.

Infrastructure Engineer

Publicis Sapient (Sapient Consulting Pvt. Ltd.)
05.2019 - 10.2021
  • Managed infrastructure on AWS and VMware; automated provisioning and patching with Ansible.
  • Hands-on experience with Cloud DevOps tools: EC2, EBS, ALB/ELB, CloudWatch, GitHub, and Jenkins for CI/CD.
  • Set up monitoring/alerting for infra and apps; ensured accurate signals and timely escalations.
  • Worked with Apache, Nginx, and Tomcat; supported application teams and reduced manual toil via scripting.
  • Performed DR testing with Veeam; executed repository and server migrations (Bitbucket, vCenter).

Education

B.Tech - Electronics & Communication

MDU

CBSE - undefined

VV DAV Public School

Skills

Cloud: AWS (EC2, EBS, EFS, ALB/ELB, S3, Route 53, VPC, CloudWatch, API Gateway, ElastiCache), CloudAMQP

Certification

CCNA – CISCO (CSC013287370)

Accomplishments

  • Migrated AWS ElastiCache to Redis.io for production with zero downtime via phased cutover, connection draining, and compatibility validation.
  • Re-platformed RabbitMQ to managed CloudAMQP with zero downtime using Shovel for traffic bridging and data sync.
  • Engineered and tested Disaster Recovery for EKS and critical dependencies; documented runbooks and executed failover drills.
  • Collaboratively drove AWS cost optimization initiatives with cross-functional teams, achieving approximately $1.2 million in annual savings.
  • Built comprehensive Node.js health checks covering DB, cache, queue, and downstream APIs to improve early fault detection and reliability.

Timeline

Site Reliability Engineer / DevOps Engineer

Monotype Imaging
10.2021 - Current

Infrastructure Engineer

Publicis Sapient (Sapient Consulting Pvt. Ltd.)
05.2019 - 10.2021

CBSE - undefined

VV DAV Public School

B.Tech - Electronics & Communication

MDU
Akash Bhatia