Summary

Overview

Work History

Education

Skills

Timeline

Hobbies and Interests

Shrey Mehrotra

Gurugram,Haryana

Summary

Incident, Change, Problem, and Service Reliability Engineer with 6+ years of experience improving service uptime, operational stability, and reliability in enterprise environments. Proven track record of reducing MTTR by driving structured incident resolution, optimizing monitoring/alerting pipelines, and ensuring adherence to SLOs and error budgets. Adept at leading cross-functional bridges, performing RCA/PIRs, and implementing preventive measures that cut repeat incidents by 25%+. Strong communicator skilled in stakeholder management, change governance, and post-incident reviews.

Overview

years of professional experience

Work History

INCIDENT RESPONSE ANALYST (Incident, Problem and Change Manager)

Cosm Technologies India Pvt Ltd

04.2024 - Current

Led end-to-end incident lifecycle for P1/P2 outages across AWS, GCP and on-prem infrastructure, coordinating resolver teams and ensuring SLA compliance.
Implemented and optimized real-time telemetry and alerting pipelines for SaaS platforms using Prometheus, CloudWatch, Grafana, and related tools, achieving 40% reduction in MTTD.
Acted as the primary point of contact during critical incidents, performed first level troubleshooting, ensured >99.95% service uptime by swiftly resolving P1/P2 incidents via ServiceNow workflows, and leading cross-functional bridges for incident resolution.
Delivered accurate, timely communication to stakeholders including customers and vendors.
Conducted root cause analysis (RCA) and post-incident reviews (PIR) with stakeholders from Product, Engineering, and Support, leading to 25% fewer repeat incidents.
Ensured strict adherence to escalation protocols while maintaining high-quality incident documentation, SOPs, and technical guides.
Reduced resolution time by 30% through streamlined processes and effective leadership during outages.
Integrated SLO and error budget tracking into monitoring systems to guide operational priorities.

Senior Incident & Problem Management

British Telecom

09.2021 - 04.2024

Led end-to-end lifecycle of P1/P2 incidents, achieving >95% SLA adherence and ensuring rapid mitigation and resolution.
Participated in on-call rotations for high-severity events, performing first-line fixes and escalating with complete technical context.
Managed 24×7 monitoring of enterprise LAN/WAN, VPNs, firewalls, and cloud workloads.
Delivered real-time updates and periodic reports to key stakeholders during major outages.
Conducted impact and risk assessments to prioritize incidents based on urgency, business impact, and SLAs.
Facilitated Post-Incident Reviews (PIRs), collaborated with cross-functional teams on preventive measures, and ensured resolution steps were captured in knowledge bases.
Created Major Incident Reports (MIRs), RCA documentation, and maintained detailed records in ServiceNow.
Developed proactive alerts in LogicMonitor, ThousandEyes, and GCP Monitoring to catch degradations before impact.
Assisted in service failover testing for DR readiness.
Delivered training and knowledge-sharing sessions across teams and new joiners to reduce recurrence of major incidents.
Drove trend analysis and proactive strategies to enhance service reliability and reduce incident volume.

Service Reliability Engineer(SRE)

British Telecom

10.2020 - 09.2021

Responsible for providing troubleshooting for open incidents related to network issues including firewalls, VPNs, routers, and switches.
Optimized router settings for customer-specific and failover requirements, while creating RCAs, SOPs, and runbooks to reduce incident recurrence.
Troubleshooting experience on network performance like QoS, Bandwidth policing, Traffic shaping, Latency, Jitter, bandwidth utilisation.
Implemented end-to-end service monitoring for cloud applications using Prometheus, CloudWatch, and Grafana, improving MTTD by 35%.
Defined and tracked SLOs, SLIs, and error budgets for critical services, providing actionable insights to development and product teams.
Tracked transmission alarms in NMS, diagnosed affected circuits, and executed remediation plans in coordination with MPLS core and transmission teams.
Products we support for these customers are MPLS, Internet, ISDN, ADSL/DSL, VoIP etc. and handle escalated cases and ensure customers for timely resolution SLA management, and incident quality management.
Technical troubleshooting with circuit providers and hardware (Router and Switches) vendors and on data circuit connectivity like serial and Ethernet.

Technical Content Developer

PrepInsta Technologies Pvt Ltd

04.2019 - 09.2020

Prepared SEO-based Content for the website related to the company's product.
Prepared content and questions related to Logical, Aptitude and English.
Provided mentorship to summer interns on company’s products.
Awarded for the best SEO page.

Education

B.tech - Electronics and Communications

Dr. A. P. J. Abdul Kalam Technical University

Raj Kumar Goel Institute Of Technology

06.2019

Skills

Major Incident Management (MIM)
Problem Management (RCA & PIR)
ITIL Framework (Incident, Change, Problem)
Stakeholder Communication
Team Collaboration and Leadership
ServiceNow, Jira
Experienced in managing AWS cloud solutions(EC2, IAM, RDS, ELB, etc)
Tools: SolarWinds, Grafana, Prometheus, Sevone, ThousandEyes, Cloud watch, NPMD, PuTTY, Cisco Meraki, Palo Alto
SLA Management & Escalation Handling

CAB & Change Coordination
Automation Escalation Workflows
On-Call Coordination / 24x7 Support
Protocols: TCP/IP OSPF, BGP, EIGRP, STP, DTP, VTP, VLAN, VPN, MPLS, NAT, TACACS
Network and Security Concepts like CCNA, CCNP, PCNSE
Operating System- Linux, windows
Languages: JAVA, Python(basic)
Good understanding of REST APIs
Good knowledge of Microsoft Excel, PowerPoint and Data Management Systems(MySQL, PostgreSQL, Oracle DB)

Timeline

INCIDENT RESPONSE ANALYST (Incident, Problem and Change Manager)

Cosm Technologies India Pvt Ltd

04.2024 - Current

Senior Incident & Problem Management

British Telecom

09.2021 - 04.2024

Service Reliability Engineer(SRE)

British Telecom

10.2020 - 09.2021

Technical Content Developer

PrepInsta Technologies Pvt Ltd

04.2019 - 09.2020

B.tech - Electronics and Communications

Dr. A. P. J. Abdul Kalam Technical University

Hobbies and Interests

Traveling, Gym, Gaming, Watching movies, series, Learning new skills

Shrey Mehrotra

Summary

Overview

Work History

INCIDENT RESPONSE ANALYST (Incident, Problem and Change Manager)

Senior Incident & Problem Management

Service Reliability Engineer(SRE)

Technical Content Developer

Education

B.tech - Electronics and Communications

Skills

Timeline

INCIDENT RESPONSE ANALYST (Incident, Problem and Change Manager)

Senior Incident & Problem Management

Service Reliability Engineer(SRE)

Technical Content Developer

B.tech - Electronics and Communications

Hobbies and Interests

Similar Profiles

RAQUEL HERNANDEZRAQUEL HERNANDEZ

AMIT SINGHAMIT SINGH

MANISH SEHGALMANISH SEHGAL

KARISHMA SALIHUNDAMKARISHMA SALIHUNDAM