Infrastructure Stability Taskforce : Spearheaded the initiative to reduce major outages and technical incidents by over 42%, implementing strong processes and service monitoring. Collaborated with cross-functional teams such as engineering, CPI, and change management to reduce change-induced incidents and improve overall infrastructure reliability.
Observability Implementation : Played a pivotal role in designing an in-house observability tool leveraging Dynatrace, Splunk, ServiceNow, and Qliksense, allowing for proactive issue detection and resolution. Led the integration efforts that enhanced visibility across the entire infrastructure and application landscape.
Splunk Re-architecture : Architected and led the migration of a 22 TB daily data ingestion Splunk platform from on-prem to a Splunk SaaS model. Key challenges included ensuring secure data transmission, optimizing costs, and aligning with business needs while maintaining operational continuity.