

IT experience of 18 years. Observability/SRE Manager with 7+ years of experience in designing and scaling observability strategies across enterprise systems. Proven track record of implementing end-to-end visibility solutions using Splunk Observability Cloud, Datadog, and Grafana. Strong leadership in driving SRE and DevOps teams toward improved system health, reduced MTTR, and operational excellence. Skilled in KPI-based monitoring, AIOps, self-healing automation, and cost optimization of monitoring platforms.
AI in Observability: AIOps, anomaly detection, predictive monitoring, AI-driven root cause analysis
Data & ML Tools: Python (scikit-learn, pandas), Splunk Machine Learning Toolkit (MLTK), AWS AI Services
Cloud & Infrastructure: AWS, Kubernetes, Docker
Automation & CI/CD: Jenkins, Rundeck, Ansible, Puppet
Scripting & Tools: Python, Bash, Shell, TCL
Incident & Problem Management: ServiceNow, Jira, Confluence
Leadership & Strategy: Observability Roadmaps, SRE Team Management, Agile methodology