Summary

Overview

Work History

Education

Skills

Accomplishments

Timeline

Florance T. H

Hyderabad

Summary

Service Reliability Engineer with 8+ years of experience supporting large-scale, enterprise production systems. Proven expertise in incident response, Site Reliability Engineering (SRE) principles, service monitoring, alerting, RCA, and operational excellence. Strong background in SLI, SLO, SLA governance, automation to reduce operational toil, and cross-functional collaboration with Engineering, Cloud, Network, and Database teams. Known for calm execution during high-severity incidents, and driving long-term reliability improvements.

Overview

years of professional experience

Work History

Sr. SRE/Incident and Problem Management Lead

Zelis Healthcare

Hyderabad

04.2024 - Current

Own end-to-end incident and problem management for business-critical production services, leading P1/P2 incidents across cloud, network, database, and application layers.
Act as Incident Commander during high-impact outages, ensuring rapid mitigation, clear stakeholder communication, and strict SLA adherence.
Lead Problem Management initiatives, identifying recurring and chronic issues through trend analysis, RCA correlation, and incident pattern review.
Apply SRE principles to improve service reliability, availability, operational readiness, and long-term stability.
Own and optimize monitoring and alerting systems (PagerDuty, New Relic, LogicMonitor), improving signal quality, and reducing MTTA/MTTR.
Conduct 100+ structured RCAs, driving corrective and preventive actions (CAPA) to eliminate repeat failures and systemic reliability gaps.
Define, track, and report SLIs, SLOs, and reliability metrics, delivering regular service health, and performance insights to leadership.
Automate incident and problem workflows, reporting, and follow-ups to reduce operational toil, and improve response efficiency.
Partner with the Engineering, Cloud, Network, Database, and Product teams to improve runbooks, escalation paths, service operability, and resilience.
Lead and mentor a global 24/7 incident and problem response team, supporting on-call operations and reliability standards.

Major Incident Manager (Escalations Lead – Microsoft OneStore)

Milestone Technologies

Hyderabad

09.2023 - 03.2024

Led incident response for large-scale, distributed cloud services supporting Microsoft OneStore.
Coordinated global engineering, operations, and product teams across Azure and Microsoft 365 platforms.
Drove high-severity incident mitigation, RCA execution, and closure of action items.
Delivered concise, real-time updates to senior stakeholders during critical production incidents.
Translated incident learnings into problem management initiatives to improve long-term service reliability.

Release Manager – Operations & DevOps

JD Sports Fashion India PLC

Hyderabad

10.2022 - 09.2023

Managed production releases with a focus on service stability, reliability, and downtime reduction.
Strengthened observability and monitoring using Grafana, LogicMonitor, and New Relic.
Collaborated with DevOps and Engineering teams to resolve infrastructure and application performance issues.
Supported incident response and post-release stability improvements for enterprise systems.

Technical Support Engineer

Amazon Development Center

Hyderabad

08.2021 - 09.2022

Provided Tier-2 production support for enterprise-scale services.
Investigated hardware, software, and network issues impacting customer-facing systems.
Escalated defects to engineering, and improved operational documentation.
Collaborated with engineering teams to resolve complex technical problems effectively.
Assisted in training new team members on company systems and procedures.
Resolved complex technical problems through root cause analysis techniques.

Technical Support Analyst

Induct Solutions Pvt Ltd

Hyderabad

03.2017 - 08.2021

Delivered enterprise technical support and incident triage.
Improved operational efficiency through documentation and user training.
Analyzed client feedback to improve support services and user experience.
Managed ticketing system and prioritized support requests effectively.

Education

MTech -

Skills

Site reliability engineering
Service availability engineering
Incident response management
Root cause analysis
SLA and SLO management
Monitoring and alerting
Performance observability

On-call operations
Automation strategies
ITIL management
Distributed systems support
Cross-functional collaboration
Operational documentation

Accomplishments

- Built and scaled Major Incident Management and Problem Management program from scratch.
- Resolved 250+ P1/P2/P3 incidents with strong SLA adherence.
- Conducted 100+ RCAs driving measurable reliability improvements.
- Reduced MTTA/MTTR through automation and alert optimization.
- Led global on-call and escalation operations for mission-critical services.

Timeline

Sr. SRE/Incident and Problem Management Lead

Zelis Healthcare

04.2024 - Current

Major Incident Manager (Escalations Lead – Microsoft OneStore)

Milestone Technologies

09.2023 - 03.2024

Release Manager – Operations & DevOps

JD Sports Fashion India PLC

10.2022 - 09.2023

Technical Support Engineer

Amazon Development Center

08.2021 - 09.2022

Technical Support Analyst

Induct Solutions Pvt Ltd

03.2017 - 08.2021

Florance T. H

Summary

Overview

Work History

Sr. SRE/Incident and Problem Management Lead

Major Incident Manager (Escalations Lead – Microsoft OneStore)

Release Manager – Operations & DevOps

Technical Support Engineer

Technical Support Analyst

Education

MTech -

Skills

Accomplishments

Timeline

Sr. SRE/Incident and Problem Management Lead

Major Incident Manager (Escalations Lead – Microsoft OneStore)

Release Manager – Operations & DevOps

Technical Support Engineer

Technical Support Analyst

MTech -

Similar Profiles

Josh HessJosh Hess

Amit PrasadAmit Prasad

Ram Prasad MahadasyamRam Prasad Mahadasyam

Warren CordeiroWarren Cordeiro