Work Preference
Summary
Overview
Work History
Education
Skills
Certification
Work Availability
Timeline
Generic
Gopi chand Vighrahala

Gopi chand Vighrahala

Site Reliability Engineer
Hyderabad

Work Preference

Desired Job Title

Site Reliability EngineerSite Reliability EngineerDevOps EngineerCloud EngineerSystem Administrator

Work Type

Full Time

Location Preference

RemoteHybridOn-Site

Important To Me

Work-life balanceCareer advancementFlexible work hoursHealthcare benefitsTeam Building / Company Retreats

Summary

  • 9+ years of IT industry experience, including 5+ years as a DevOps Engineer with strong expertise in Cloud, CI/CD automation, and infrastructure management.
  • Skilled in Software Configuration Management (SCM), Build & Release Management, and end-to-end automation of CI/CD pipelines.
  • Hands-on experience with DevOps tools: GitHub, Bitbucket, Maven, Jenkins, Nexus, SonarQube, Artifactory, Docker, and Kubernetes.
  • Proficient in AWS cloud services (EC2, VPC, S3, EBS, Security Groups, Auto Scaling, Load Balancers, Route 53) for building secure and scalable infrastructures.
  • Strong expertise in Jenkins administration: job creation, plugin management, role assignments, scheduling builds, and master/slave configurations.
  • Experienced in containerization with Docker: creating custom images, managing containers, and orchestrating with Kubernetes.
  • Proficient in automation with Shell scripting and Ansible playbooks, reducing manual tasks and streamlining deployments.
  • Expertise in deployment strategies including rolling updates, blue-green deployments, and canary releases.
  • Collaborated with cross-functional development teams to enable daily CI/CD pipelines and support agile delivery cycles.
  • Experienced in monitoring and troubleshooting build and deployment failures, ensuring production stability and fast resolution.
  • Experience in migrating workloads to AWS, optimizing cost, and ensuring multi-AZ availability.
  • Implemented automated backup strategies and environment provisioning to support business continuity.
  • DevOps best practices.
  • Recognized for problem-solving ability, process automation mindset, and driving reliability improvements across large-scale environments.
  • Continuously improving DevOps ecosystems by adopting new tools, best practices, and cloud-native solutions to enhance operational efficiency.

Overview

10
10
years of professional experience
2
2
Certifications

Work History

Site Reliability Engineer

LivePerson
08.2024 - Current
  • Administered and optimized multi-AZ Kubernetes clusters, ensuring high availability, performance, and security.
  • Automated large-scale deployments with Helm and ArgoCD, improving scalability and reducing manual effort by 40%.
  • Defined multi-tenant namespace strategies, RBAC policies, and Pod Security Standards for secure operations.
  • Managed Kubernetes clusters using Rancher, simplifying administration and enhancing cluster governance.
  • Troubleshot application deployment issues in Kubernetes by analyzing logs, fixing misconfigurations, and ensuring smooth rollouts.
  • Managed Jenkins patching, plugin upgrades, and job configurations, ensuring pipeline stability and minimizing downtime.
  • Automated deployments via ArgoCD with blue-green and canary strategies, reducing release downtime and deployment risks.
  • Optimized AWS storage costs with S3 lifecycle policies, Glacier archival, and cross-region replication, reducing expenses by 25%.
  • Instituted cloud cost governance policies (right-sizing, lifecycle policies, reserved instances), cutting overall infrastructure spending while maintaining scalability.
  • Set up AWS CloudWatch dashboards and alarms, improving monitoring of EC2, RDS, and application-level metrics.
  • Automated incident detection and response using Prometheus, Grafana, and Alertmanager, reducing MTTR by 35%.
  • Built observability pipelines with CloudWatch Exporter for SNS-based autoscaling.
  • Troubleshot application failures by analyzing logs in ELK (Elasticsearch, Logstash, Kibana), enabling faster root-cause analysis.
  • Enforced secrets management policies using Argo Vault and Kubernetes Secrets, ensuring secure handling of sensitive data.
  • Partnered with development teams to embed “observability-first” design practices, ensuring all new services shipped with proper monitoring, logging, and tracing in place.


Site Reliability Engineer

Phenom people Pvt Ltd
03.2021 - 07.2024
  • Managed multi-node Kubernetes clusters, creating isolated namespaces for dev, QA, and production environments.
  • Migrated Java and Python applications from Docker to Kubernetes using Helm charts, improving deployment consistency and reducing manual intervention by 30%.
  • Designed and integrated end-to-end CI/CD pipelines with Jenkins, Bitbucket, Docker, and Kubernetes, accelerating release cycles and ensuring faster feature delivery.
  • Configured Jenkins servers and nodes for build & release teams, enabling automated daily deployments and reducing build failures.
  • Implemented blue-green, canary, and rolling update strategies in Kubernetes, minimizing downtime during releases and ensuring smooth production rollouts.
  • Standardized Helm values and deployment pipelines with ArgoCD, achieving zero-downtime deployments.
  • Set up monitoring and alerting for Kubernetes clusters using Prometheus, Grafana, and Loki, reducing mean time to resolution (MTTR) by 35%.
  • Built Grafana dashboards for AWS services (EMR, ALB, EC2, billing), improving visibility and cost tracking for stakeholders.
  • Configured CloudWatch alarms integrated with Slack via SNS + Lambda, enabling real-time incident notifications and faster responses.
  • Configured and managed Redis clusters to improve service response times, reducing latency by 20% for high-traffic applications.
  • Tuned Kubernetes resource allocation and autoscaling, improving system utilization efficiency.
  • Designed and managed IAM roles and groups for fine-grained access control across AWS environments, improving compliance and security posture.
  • Partnered closely with development and build/release teams to troubleshoot deployment issues, improving release stability by 25%.
  • Reduced operational toil by automating repetitive tasks (infrastructure provisioning, monitoring alerts, and routine deployments), freeing up 30% of engineering time for innovation and reliability improvements.
  • Championed SRE best practices such as automation-first culture, post-incident reviews, and error budgeting, driving platform reliability improvements.

DevOps Engineer

E-Pay Solutions Ind Pvt Ltd
12.2018 - 03.2021
  • Implemented branching, tagging, and merging strategies in Git to streamline release management across multiple environments.
  • Designed and managed Jenkins CI/CD pipelines for automated builds, testing, and deployments workflows
  • Configured Jenkins with plugins, master/slave nodes, and build notifications for scalable CI/CD operations.
  • Coordinated with developers to enforce branching, labeling, and naming conventions, ensuring consistency across teams.
  • Installation, Configuration and management in Ansible Centralized server and creating the playbooks to support various middleware application servers.
  • Authored and maintained Ansible playbooks and roles for configuration management and deployment automation.
  • Automated AWS deployments with Ansible and Shell scripts, reducing provisioning time significantly.
  • Managed server configuration and patching with Ansible, standardizing environments across dev, QA, and prod
  • Creating custom Docker images using Docker file for easier replication of DEV, QA, UAT and production Environments
  • Successfully set up Dev & Test environments on Docker enabling faster developer onboarding.
  • Provisioned and managed AWS S3 buckets with lifecycle policies, optimizing storage costs and access management.
  • Designed and implemented IAM roles and policies to enforce least-privilege access.


Cloud Engineer

Centaurus Technology Partners
06.2016 - 09.2018
  • Collaborated with multiple clients to design and scale AWS infrastructure and application deployments, ensuring elasticity, scalability, and cost optimization.
  • Recommended cloud architecture best practices to balance performance, availability, and cost-efficiency.
  • Provisioned and managed EC2 instances, AMIs, snapshots, and EBS volumes, enabling cross-region availability.
  • Created and managed S3 buckets with access policies, supporting both storage needs and static content hosting.
  • Implemented VPC architectures with public/private subnets, NAT gateways, and bastion hosts for secure communication.
  • Configured Route 53 DNS and load balancing with ELB/ALB for high-traffic applications.
  • Applied Auto Scaling policies to ensure fault tolerance and high availability.
  • Automated provisioning, backups, and routine tasks using Shell scripts and AWS CLI.
  • Managed AWS AMI lifecycle, updating base images and distributing them across environments.
  • Enforced network security using IAM roles, Security Groups, Network ACLs, and Internet Gateways.
  • Migrated legacy applications and data from VMware to AWS using AWS Import/Export, improving scalability and reducing on-premises dependencies.
  • Produced weekly/monthly usage and billing reports to provide cost visibility and optimization insights to clients.
  • Supported applications on Linux servers, performing system health checks, server automation, and web/app server installations.
  • Provided direct client support for infrastructure operations, improving issue resolution and customer satisfaction.
  • Optimized AWS costs by identifying underutilized resources and applying Reserved Instances, lifecycle policies, and right-sizing, reducing spend by 20%.

System Administrator

Velocity Infotech
08.2015 - 05.2016
  • Installed, configured, and maintained Red Hat Linux servers on VMware and physical hardware, ensuring stable and secure operations.
  • Collaborated with the storage team to provision and integrate SAN LUNs into ESX servers, improving storage availability.
  • Administered file systems using LVM and VXVM, enabling efficient disk management and scalability.
  • Automated recurring tasks (backups, health checks, cleanup jobs) using Cron and AT schedulers, reducing manual workload.
  • Diagnosed and resolved Linux server issues related to boot time, system crashes, and performance degradation, improving uptime.
  • Monitored infrastructure health and mission-critical applications using Splunk, enabling proactive issue detection.
  • Automated patch management and package updates with YUM and RPM, ensuring servers remained compliant and secure.
  • Maintained server compliance by applying regular patch updates, strengthening system security and stability.
  • Conducted performance tuning by monitoring CPU, memory, I/O bottlenecks with tools like top, vmstat, iostat, and sar.

Education

Master of Science - Computer Science

Texas A&M University
Kingsville, TX
12.2015

B.Tech -

ACE Engineering College
Hyderabad, India
05.2013

Skills

Maven

undefined

Certification

AWS Certified Solution Architect – Associate

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Site Reliability Engineer

LivePerson
08.2024 - Current

Site Reliability Engineer

Phenom people Pvt Ltd
03.2021 - 07.2024

DevOps Engineer

E-Pay Solutions Ind Pvt Ltd
12.2018 - 03.2021

Cloud Engineer

Centaurus Technology Partners
06.2016 - 09.2018

System Administrator

Velocity Infotech
08.2015 - 05.2016

Master of Science - Computer Science

Texas A&M University

B.Tech -

ACE Engineering College
Gopi chand VighrahalaSite Reliability Engineer