Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

G Praveen Kumar

Bengaluru

Summary

Senior SRE, Global Incident Manager with 10+ years of experience in leading high-severity incidents, crisis management, and SaaS operations in 24x7 production environments. Proven ability to manage end-to-end major incident lifecycles, drive escalations, and restore critical services within SLA timelines.

Experienced in coordinating global cross-functional teams across Engineering, Product, and Operations, while serving as the primary point of contact for executive leadership during business-critical outages. Strong expertise in Post Incident Reviews (PIR), Root Cause Analysis (RCA), and continuous service improvement.

Demonstrated success in designing and implementing Incident and Change Management frameworks aligned with ITIL practices. Adept at stakeholder communication, conflict resolution, and driving operational excellence in high-pressure, time-critical environments.

Overview

15
15
years of professional experience
1
1
Certification

Work History

Business Process Manager

Vonage Business Communications
01.2025 - Current
  • Own and govern end-to-end Incident, Change, and Problem Management for large-scale SaaS, CPaaS, and CCaaS platforms, ensuring adherence to SLA/SLO commitments
  • Act as the primary escalation point and crisis coordinator during high-severity incidents, driving cross-functional resolution across Engineering, Product, and Support teams
  • Lead Major Incident Management (MIM) calls, providing real-time communication and impact updates to C-suite and executive stakeholders
  • Serve as the central incident commander, ensuring timely service restoration and minimizing business and customer impact
  • Designed and implemented a unified Incident and Change Management framework, improving operational efficiency and strengthening governance
  • Drive Post Incident Reviews (PIRs) and Root Cause Analysis (RCA), ensuring closure of corrective and preventive actions to reduce incident recurrence
  • Partner closely with Contact Center operations(Vonage Contact Center), ensuring platform stability and zero disruption to agent call handling and customer experience
  • Participate in Change Advisory Board (CAB) to assess risk and approve critical production changes, ensuring safe and controlled releases
  • Collaborate with Engineering, Product, and Service Delivery teams to drive continuous service improvement and incident reduction initiatives
  • Champion ITSM best practices aligned with ITIL framework (ITIL Foundation certified)

SRE - Lead Incident Commander

247.ai
04.2020 - 01.2025
  • Led a team of 5 SRE engineers, driving 24x7 production support, incident management, and service reliability for mission-critical SaaS data platforms
  • Owned end-to-end management of high-severity incidents, including triage, escalation, and resolution within SLA/SLO commitments
  • Drove real-time crisis management, ensuring clear role assignment, structured communication, and timely decision-making during major outages
  • Performed deep-dive RCA/PIRs, identifying system, infrastructure, and network-related failure patterns to reduce incident recurrence
  • Monitored and supported distributed infrastructure, including databases, application stacks, and network devices
  • Troubleshot network-related incidents (BGP alerts, circuit issues, latency, packet loss) in Juniper-based environments, providing initial diagnosis and impact assessment before escalation to network engineering teams
  • Analyzed system performance using latency, throughput, and availability metrics to identify bottlenecks across distributed systems
  • Orchestrated zero-downtime cloud migration to GCP for critical data pipelines, ensuring uninterrupted business operations
  • Architected multi-region infrastructure using Terraform (IaC), improving resilience and disaster recovery capabilities
  • Partnered with Engineering and Product teams to enhance monitoring, observability, and proactive incident detection
  • Mentored team members, conducted performance reviews, and built a high-performing SRE team focused on operational excellence
  • Owned stability of distributed systems built on Hadoop, Kafka, and large-scale data processing frameworks

Systems Engineer/SRE Engineer – Technical Support

247.ai
01.2015 - 03.2020
  • Provided L1/L2 production support for large-scale SaaS and distributed systems, with early experience in Network Operations Center (NOC) monitoring followed by transition into SRE responsibilities.
  • Monitored infrastructure across applications, databases, and network devices, ensuring high availability and timely incident detection
  • Performed initial triage and troubleshooting of network alerts, including BGP-related issues, circuit outages, and connectivity degradation, collaborating with network engineering teams for resolution
  • Investigated and resolved complex incidents across application, infrastructure, and network layers, driving effective root cause identification
  • Actively participated in major incident war rooms, supporting incident commanders in restoring services during critical outages
  • Managed incident lifecycle aligned with ITIL practices, including detection, escalation, tracking, and resolution of high-priority issues
  • Supported Hadoop ecosystem components, improving cluster stability, high availability, and eliminating single points of failure
  • Conducted performance analysis (latency, throughput, system behavior) to identify bottlenecks and improve reliability
  • Executed production deployments, rollbacks, and release validations with minimal impact to live environments
  • Developed and maintained runbooks, SOPs, and knowledge base documentation, improving incident response efficiency
  • Automated repetitive operational tasks using scripting, enhancing scalability and operational efficiency

SIM Specialist

24/7 Customer Private LTD
09.2011 - 12.2014
  • Frontend helpdesk team catering to the in-house needs with user access management, vendor maintenance, offer support on any system related issues, create incidents for user who report issues with internal applications

Education

Bachelor of Science - Computer Science

Madanapalle Institute of Technology & Science
Madanapalli, India
04-2011

Skills

    Incident Leadership & Crisis Management
  • P0/P1 Incident Command (Incident Commander)
  • War Room / Bridge Call Leadership (24x7 Operations)
  • Real-time Crisis Decision-Making & Escalation Management
  • Executive & Business Stakeholder Communication
  • Customer Impact Management & Service Restoration

  • Incident Governance & ITSM
  • Incident, Change & Problem Management (ITIL Aligned)
  • SLA / SLO Definition, Tracking & Compliance
  • Root Cause Analysis (RCA) & Post Incident Reviews (PIR)
  • Change Advisory Board (CAB) Participation
  • Business Continuity & Service Resilience

  • Networking & Infrastructure Troubleshooting
  • BGP Alert Triage & Circuit Issue Analysis
  • TCP/IP, DNS, Subnetting, QoS Fundamentals
  • Latency, Packet Loss & Throughput Analysis
  • Load Balancing (F5, Netscaler)
  • Juniper Network Device Monitoring & Incident Triage
  • Distributed Systems Behavior in Networked Environments

  • Cloud, Systems & Observability
  • Cloud Platforms: AWS, GCP (Multi-region Architectures)
  • Observability: ELK Stack, Prometheus, Grafana
  • Distributed Systems: Kafka, Hadoop
  • Containerization & Orchestration: Kubernetes, Docker
  • Infrastructure as Code: Terraform

  • Operations, Collaboration & Automation
  • Cross-functional Leadership (Engineering, Product, Network, Support)
  • ITSM & Collaboration Tools: Jira, Opsgenie, Confluence, Slack
  • Monitoring & Alerting Strategy
  • Process Optimization & Operational Excellence
  • Automation: Bash, Python

Certification

ITIL Foundation V5

Timeline

Business Process Manager

Vonage Business Communications
01.2025 - Current

SRE - Lead Incident Commander

247.ai
04.2020 - 01.2025

Systems Engineer/SRE Engineer – Technical Support

247.ai
01.2015 - 03.2020

SIM Specialist

24/7 Customer Private LTD
09.2011 - 12.2014

Bachelor of Science - Computer Science

Madanapalle Institute of Technology & Science
G Praveen Kumar