Summary
Overview
Work History
Education
Skills
Certification
Timeline
HIGHLIGHTS
background-images

Sunil aditya Elipina

Hyderabad

Summary

Dynamic DevOps and Site Reliability Engineer with 4 years of experience in enhancing automation, reliability, and scalability within cloud-native environments. Proficient in AWS, Kubernetes, Docker, Terraform, and Ansible, with a strong emphasis on CI/CD practices, observability, and infrastructure security. Successfully implemented robust monitoring solutions using Prometheus and Grafana, automated pipelines through Jenkins and GitHub, and applied SRE principles such as SLIs, SLOs, and error budgets to maintain high availability and performance. Skilled in fostering collaboration among cross-functional teams to streamline deployments and optimize system reliability for maximum operational efficiency.

Overview

4
4
years of professional experience
1
1
Certification

Work History

DevOps Engineer

TCS
01.2023 - Current
  • Designed and implemented CI/CD pipelines using Jenkins for microservices-based deployments, reducing release time by 20%.
  • Automated AWS infrastructure provisioning using Terraform, deploying across multiple environments (dev, staging, prod).
  • Containerized applications using Docker and deployed them on Kubernetes (EKS), enabling scalable and resilient microservice delivery.
  • Conducted blue-green and canary deployments on Kubernetes, ensuring zero-downtime application upgrades.
  • Integrated code quality and security tools including SonarQube, OWASP ZAP, and Trivy for early vulnerability detection.
  • Configured VPC, subnets, security groups, and Route 53 for high availability and fault-tolerant architecture.
  • Setup Prometheus and Grafana for real-time monitoring and alerting, improving observability and incident response.
  • Enabled Slack notifications for build status, improving team communication and deployment visibility.
  • Performed shell scripting to automate routine system administration tasks and enhance CI/CD workflows.
  • Integrated DevOps practices into the SDLC, supporting continuous integration and delivery while collaborating with cross-functional teams.

Site Reliability Engineer

TCS
11.2021 - 12.2022
  • Defined SLIs/SLOs for APIs and batch systems, delivering 99.9% availability in alignment with business SLAs.
  • Implemented error budget policies to balance reliability with feature delivery, applying deployment freezes when thresholds were exceeded.
  • Reduced MTTR by 60% through automated incident detection with Prometheus alert rules, Slack integrations, and runbooks.
  • Enhanced resilience by configuring Kubernetes liveness/readiness probes and enabling HPA for self-healing and dynamic scaling.
  • Led root cause analysis (RCA) for P1/P2 incidents and facilitated blameless postmortems to drive long-term corrective actions.
  • Eliminated 40% toil via automation using Bash scripts and AWS Lambda for health checks, log collection, and resource cleanup.
  • Conducted chaos engineering tests (node failures, pod kills) to validate Kubernetes and AWS failover strategies.
  • Developed Grafana and CloudWatch dashboards to track SLIs such as error rates, latency, and throughput for proactive reliability monitoring

Education

B-Tech - Mechanical Engineering

SRKREC
Bhimavaram, India
09.2020

Skills

  • Amazon Web Services (AWS)
  • DevOps
  • Infrastructure as Code using Terraform
  • CI/CD Pipelines with Jenkins
  • Build tool with Maven
  • Git & GitHub Actions
  • Ansible
  • Docker & Kubernetes, Helm
  • Grafana, Prometheus &
  • Splunk
  • AWS CloudWatch
  • Shell Scripting & Python
  • Cost Optimization
  • ELK Stack & Datadog
  • ServiceNow & Slack
  • Secrets Management
  • SLA, SLO & SLI
  • Static Code Analysis
  • Microservices Architecture
  • Cloud Security

Certification

AWS cloud practitioner (CLF-C02)

Timeline

DevOps Engineer

TCS
01.2023 - Current

Site Reliability Engineer

TCS
11.2021 - 12.2022

B-Tech - Mechanical Engineering

SRKREC

HIGHLIGHTS

Received Two on-the-spot awards.
Sunil aditya Elipina