
Product support & Site Reliability Engineering, Manager with 13+ years of experience leading 24/7 production operations, incident management, and reliability engineering for large-scale cloud platforms. Reduced MTTR, led Sev-1/Sev-2 incident response, and built globally distributed follow-the-sun support teams for distributed systems in cloud. Strong experience across AWS infrastructure, observability, incident tooling, and cross-functional execution to deliver highly available and resilient digital platforms. Managed production incident operations for a fleet of 95,000 servers across hybrid cloud infrastructure.
Lead 24/7 production operations and incident management for a global SaaS integration platform.
Award:
Received Star Certificate of Excellence for automating 700 monthly tickets using PowerShell, saving 20,000 man-hours annually.