Experienced IT Operations and Monitoring Engineer with 5+ years in infrastructure monitoring, major incident management, and SRE principles across telecom and enterprise environments. Proven ability in designing proactive monitoring dashboards, leading P1/P2 incidents, automating repetitive tasks, and collaborating with infra and DevOps teams. Adept at tools like Splunk, Nagios, SolarWinds, Netcool, and supporting hybrid infra. Currently leading an L1 monitoring team and driving AIOps automation initiatives.
Ensured 99.99% service uptime for telecom applications through comprehensive end-to-end monitoring and reliability engineering.
Spearheaded major incident response for P1/P2 issues via bridge calls, ensuring rapid resolution and timely stakeholder communications.
Executed proactive problem management by analyzing alert patterns, conducting root cause analyses, and maintaining a knowledge database for recurring incidents.
Developed consolidated monitoring dashboards in Splunk by integrating various infrastructure monitoring tools, enhancing team visibility.
Aligned alert configurations, thresholds, and health checks between monitoring and infrastructure teams across networks and vCenter environments.
Managed 24x7 application and infrastructure support, overseeing event correlation and automated remediation workflows.
Led and mentored a team of 11 L1 engineers, focusing on shift management, training, escalation handling, and SLA compliance.
Coordinated with AI team to implement AIOps for enhanced monitoring automation, reducing manual intervention.
ITIL Foundation – Pursuing (Expected Completion: September 2025)
Nithya K,
Project manager
nithya.k@prodapt.com, Ph: +91 7305867033
Prodapt Solutions Pvt Ltd