Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Tools
Key Reliability Operations Impact
Timeline
Generic

Suman Gudikandula

Hyderabad

Summary

Results-driven Site Reliability & Production Operations Leader with 17 years managing mission-critical systems and global support for large-scale platforms. Led 24x7 reliability operations and incident response, enhancing service reliability and optimizing observability strategies. Focused on operational automation and maintaining high platform availability during peak traffic events.

Overview

18
18
years of professional experience

Work History

SRE Lead (Site Reliability Operations)

Fanatics ECommerce
06.2022 - Current
  • Lead a 15-member global SRE / Production Support team responsible for 24x7 reliability operations.
  • Act as Incident Commander for Sev1 / Sev2 incidents ensuring rapid triage, stakeholder communication, and service restoration.
  • Coordinate cross functional teams during critical outages reducing resolution time and minimizing revenue impact.
  • Drive post incident RCA reviews and work with engineering teams to implement permanent fixes.
  • Enhanced monitoring coverage and alert quality to enable faster anomaly detection.
  • Supported major peak events like Black Friday and Cyber Monday to ensure stable platform performance during high volume traffic.

Production Support Lead

Fanatics ECommerce
11.2020 - 05.2022
  • Designed operational workflows, monitoring processes, and incident response playbooks to streamline incident resolution.
  • Led transition of Data Reliability Operations to internal support model, enhancing response times and operational control.
  • Established runbooks and documentation to facilitate faster onboarding and effective knowledge transfer.

Service Delivery Lead

Cognizant Technology Solutions
01.2014 - 01.2020
  • Managed application support and incident operations for global media platforms (ViacomCBS, NBCUniversal), ensuring timely resolution of incidents.
  • Led teams in incident response and service requests, facilitating smooth production deployments.
  • Conducted incident bridge calls and coordinated engineering teams during major outages.
  • Delivered operational metrics reports and incident trend analysis, identifying key areas for enhancing platform stability.

Service Delivery Lead

Cognizant Technology Solutions
01.2008 - 01.2014
  • Led team of 8 in supporting global pharmaceutical applications, ensuring seamless service delivery.
  • Managed incident, problem, and change management processes to maintain service continuity and client satisfaction.
  • Delivered SLA reporting and trend analysis for service improvements.
  • Coordinated patching impact analysis and emergency changes.
  • Handled client escalations and facilitated service review meetings to address concerns and improve service quality.

Education

B.Tech - Computer Science & Information Technology

Jawaharlal Nehru Technological University
Hyderabad
01-2004

Skills

  • MuleSoft Anypoint
  • Airflow
  • RabbitMQ
  • Grafana
  • NewRelic
  • Splunk
  • BigPanda
  • SolarWinds
  • Tanium
  • OpsGenie
  • ServiceNow
  • Jira
  • Remedy
  • Zendesk
  • Confluence
  • VSphere
  • Windows
  • Linux
  • Putty
  • Stonebranch
  • GoAnywhere
  • Argos
  • Qubole

Accomplishments

  • Received BOLD Award for leading successful peak event at fanatics for 2024-2025.
  • Received multiple Spot Awards.
  • Recognized by clients and Cognizant leadership for service excellence.
  • Suggested automation initiatives to eliminate repetitive operational tasks.
  • Contributed to process improvement initiatives to enhance operational efficiency.

Tools

Windows, Linux, Grafana, BigPanda, NewRelic, SolarWinds, ServiceNow, Jira, Zendesk, Remedy, Splunk, Argos, Stonebranch, GoAnywhere, Airflow, RabbitMQ, MuleSoft Anypoint, Tanium, OpsGenie, Confluence, vSphere, Putty, Qubole

Key Reliability Operations Impact

  • Led 24x7 production operations for large-scale ecommerce platforms supporting millions of users during peak events.
  • Handled Sev1 / Sev2 major incidents and coordinated war rooms to restore services quickly and minimize customer impact.
  • Improved operational response efficiency and reduced MTTR by ~30% through improved monitoring, runbooks, and incident workflows.
  • Supported multiple environments maintaining 99.9%+ service availability through proactive monitoring and rapid incident response.
  • Managed and mentored 15+ global support engineers across multiple shifts ensuring SLA compliance and operational stability.
  • Analyzed incident trends and implemented automation initiatives reducing repetitive operational work by 20%+.

Timeline

SRE Lead (Site Reliability Operations)

Fanatics ECommerce
06.2022 - Current

Production Support Lead

Fanatics ECommerce
11.2020 - 05.2022

Service Delivery Lead

Cognizant Technology Solutions
01.2014 - 01.2020

Service Delivery Lead

Cognizant Technology Solutions
01.2008 - 01.2014

B.Tech - Computer Science & Information Technology

Jawaharlal Nehru Technological University
Suman Gudikandula