Summary
Overview
Work History
Education
Technical Skills
Timeline
Generic

N Bharani Srivatsa

SRE Manager
Chennai

Summary

An accomplished People Leader and SRE Manager with 17.8 years of experience with a strong technical program ownership and managing high-performance teams across diverse domains including DevOps, Cloud Management, Site Reliability Engineering (SRE), and Scrum Management. Highly skilled in driving reliability, observability, and efficiency through strategic team leadership, process optimization, and proactive incident management. Adept at creating high-functioning teams, resolving conflicts, motivating team members, and managing key stakeholders.

  • People Leadership: Experienced in managing cross-functional teams, including cloud engineers, release engineers, and observability specialists. Focused on mentorship, performance management, and team development. Delivered training and career development programs, fostering a culture of continuous learning. Manage team capacity, on-call schedules, and load balancing across projects.
  • Stakeholder Management & Conflict Resolution: Managed senior stakeholders in high-stakes environments. Resolved conflicts between development and operations teams to align priorities, ensuring efficient project execution.
  • SRE Manager: Expert in managing cloud infrastructure, ensuring service uptime, and optimizing cloud costs.
  • Partner with Dev teams to improve CI/CD pipelines, rollbacks, feature flagging, etc.
  • Standardize and improve observability practices (monitoring, logging, tracing)
  • Own and optimize the on-call experience (e.g., alert fatigue, response playbooks)
  • Ensure robust incident response, postmortems, and blameless RCA processes are in place.
  • Real-Time Incident Management: Managed several high-priority incidents, ensuring swift resolution and implementing long-term fixes that increased system stability.
  • Managing Scrum Events: Facilitated Scrum ceremonies and coached teams to improve Agile practices. Ensured timely and efficient project deliveries while maintaining high collaboration and team engagement.
  • Define, coordinate, and execute Kubernetes and EKS version upgrades across environments with minimal downtime.
  • Manage and review Terraform modules used for provisioning infrastructure.
  • Plan and execute Splunk version upgrades, integration updates, and agent rollouts across systems.
  • Automate updates and rollback of GITHUB runner versions; ensure isolation and performance tuning.
  • Experienced in managing and supporting mission-critical applications in high-availability production environments.

Overview

18
18
years of professional experience
6
6
years of post-secondary education

Work History

SRE Manager

Comcast
Chennai
02.2024 - Current
  • Service Reliability: Ensure uptime, performance, and capacity objectives (SLOs/SLIs/SLAs) are defined and met.
  • Monitoring & Observability: Oversee system observability practices—metrics, logging, tracing—to detect and debug issues proactively.
  • On-call Practices: Manage fair and sustainable on-call rotations and reduce operational toil.
  • Cost Awareness: Collaborate with finance/ops to ensure infrastructure cost efficiency and budgeting.
  • Oversee CI/CD pipelines to ensure fast, safe, and reliable deployments.
  • Incident Management & RCA Culture: Own incident response frameworks; lead blameless postmortems and ensure learning is embedded.
  • Manage and prioritize inbound tickets across cloud, release, and observability domains.
  • Work with engineering and support teams to ensure timely ticket resolution.
  • Analyze recurring tickets to reduce operational toil through automation or process changes.
  • Scrum Leadership: Facilitated Scrum events to ensure that the team adhered to Agile methodologies and effectively delivered on sprint goals, improving overall project visibility.
  • Team Management: Lead and mentor engineers across all three domains; manage performance reviews, career growth, hiring.
  • People Leadership: Experienced in managing cross-functional teams, including cloud engineers, release engineers, and observability specialists. Focused on mentorship, performance management, and team development.
  • Stakeholder Management & Conflict Resolution: Managed senior stakeholders in high-stakes environments. Resolved conflicts between development and operations teams to align priorities, ensuring efficient project execution.

Technical Manager

HCL Technologies | Royal Bank of Scotland
Chennai
05.2021 - 02.2024
  • Deploy, configure, and manage cloud resources (compute, storage, networking, databases) in AWS Platforms.
  • Proactively monitor system health via Prometheus, Grafana, and Splunk.
  • Create, configure, and manage EKS clusters across multiple environments (dev, stage, prod.
  • Manage IAM policies, roles, and permissions to enforce security and least-privilege access
  • Respond to incidents, perform root cause analysis, and implement preventive measures.
  • Maintain clear and up-to-date documentation of infrastructure, configurations, and processes
  • Team Leadership: Managed a cross-functional team of 10+ engineers, driving team performance and professional growth through regular feedback and targeted development programs.
  • Scrum Mastery: Facilitated daily stand-ups, sprint planning, retrospectives, and backlog grooming sessions, ensuring Agile best practices were followed.

Senior Infrastructure System Engineer

DTCC Solutions Private Limited
Chennai
08.2020 - 05.2021
  • Deploy, configure, and manage cloud resources (compute, storage, networking, databases) in AWS Platforms.
  • Proactively monitor system health via Prometheus, Grafana, and Splunk.
  • Create, configure, and manage EKS clusters across multiple environments (dev, stage, prod.
  • Manage IAM policies, roles, and permissions to enforce security and least-privilege access
  • Respond to incidents, perform root cause analysis, and implement preventive measures.
  • Maintain clear and up-to-date documentation of infrastructure, configurations, and processes.
  • Scrum Mastery: Facilitated daily stand-ups, sprint planning, retrospectives, and backlog grooming sessions, ensuring Agile best practices were followed.

Assistant Consultant

Tata Consultancy Services Limited | Vodafone India
Chennai
07.2015 - 07.2020
  • WebSphere Administration: Administered WebSphere Application Server (WAS) for multiple applications, ensuring consistent performance and availability.
  • Team Leadership: Managed a team of 8+ WebSphere administrators, ensuring high-quality service delivery and continuous improvements.
  • Deployment Optimization: Automated manual processes, reducing deployment errors and improving efficiency •
  • Incident Management: Led incident management for WebSphere-related issues.

Senior Administrator

Wipro Technologies Limited | Capital One
06.2014 - 06.2015
  • Administered WebSphere Application Server (WAS) for multiple applications, ensuring consistent performance and availability.
  • Led incident management for WebSphere-related issues.
  • Installed and configured WebSphere Application Servers and managed related environments to ensure application performance.
  • Worked on various troubleshooting tasks related to WebSphere server configurations and application deployments.

Senior Specialist

HCL Technologies Limited | USAA
Chennai
05.2012 - 06.2014
  • Administered WebSphere Application Server (WAS) for multiple applications, ensuring consistent performance and availability.
  • Led incident management for WebSphere-related issues.
  • Installed and configured WebSphere Application Servers and managed related environments to ensure application performance.
  • Worked on various troubleshooting tasks related to WebSphere server configurations and application deployments.
  • Administered WebSphere application environments, ensuring 99% availability for critical systems.
  • Managed and supported IBM WebSphere Application Server (WAS) in a 24x7 production environment, ensuring high availability and performance of enterprise applications.

Software Engineer

Mahindra Satyam | State Bank of India
Chennai
08.2007 - 05.2012
  • Administered WebSphere Application Server (WAS) for multiple applications, ensuring consistent performance and availability.
  • Worked on incident management for WebSphere-related issues.
  • Installed and configured WebSphere Application Servers and managed related environments to ensure application performance.
  • Worked on various troubleshooting tasks related to WebSphere server configurations and application deployments.

Education

Bachelor of Engineering - ECE

M N M Jain Engineering College | Anna University
Chennai, India
04.2001 - 05.2007

Technical Skills

  • DevOps & Cloud Technologies: AWS, Kubernetes, Terraform, GitHub, GitHub Co-Pilot.
  • SRE & Observability: Incident Management, SLOs/SLIs, Prometheus, Grafana, Splunk, Datadog, Cloud Watch.
  • Incident Management : JIRA, AutoNow,ServiceNow,Spark Callout.
  • Agile & Scrum: Agile Project Management, Scrum, JIRA, Confluence.
  • AI Tools: ChatGPT, Perplexity AI, GitHub Co-Pilot, Prompt Engineering.

Timeline

SRE Manager

Comcast
02.2024 - Current

Technical Manager

HCL Technologies | Royal Bank of Scotland
05.2021 - 02.2024

Senior Infrastructure System Engineer

DTCC Solutions Private Limited
08.2020 - 05.2021

Assistant Consultant

Tata Consultancy Services Limited | Vodafone India
07.2015 - 07.2020

Senior Administrator

Wipro Technologies Limited | Capital One
06.2014 - 06.2015

Senior Specialist

HCL Technologies Limited | USAA
05.2012 - 06.2014

Software Engineer

Mahindra Satyam | State Bank of India
08.2007 - 05.2012

Bachelor of Engineering - ECE

M N M Jain Engineering College | Anna University
04.2001 - 05.2007
N Bharani SrivatsaSRE Manager