Experienced Site Reliability Engineer (SRE) with a proven track record of 12+ years in designing and optimizing highly available, scalable, and resilient cloud-native systems.
Proficient in OpenStack, VMware, DevOps, CI/CD, and Infrastructure as Code (IaC) to drive efficiency and streamline operations.
Committed to automating operational processes to enhance system reliability and minimize manual efforts. Skilled in incident management, observability, proactive monitoring, and root cause analysis.
Proficient in incident response and resilience engineering, including postmortems and chaos engineering.
Overview
12
years of professional experience
2012
years of post-secondary education
Work History
Capgemini
R&D Manager - NetAct
10.2022 - Current
Job overview
Optimized system reliability & availability by designing self-healing automation for network management system (NMS) deployments
Led SRE best practices implementation for NetAct, reducing MTTR by 40% via automated root cause analysis tools
Developed a zero-touch installation automation tool, eliminating manual installation steps and reducing deployment time by 50%
Acted as SPOC for high-priority customer escalations, driving RCA and resolution for production outages
HCL Technologies
Senior Technical Lead - CiscoVIM
03.2021 - 10.2022
Job overview
Served as Escalation Lead & SRE, ensuring 99.99% uptime for OpenStack-based cloud infrastructure
Implemented automated fault detection & recovery for NFV platform, cutting manual intervention by 60%
Designed troubleshooting playbooks & automation, reducing ticket resolution time from 7 days to 2 days
Created JIRA EPICs & technical documentation to align R&D efforts with customer reliability needs
Nokia Networks
Technical Lead - Shared Data Layer
04.2019 - 02.2021
Job overview
Led test strategy and fault coordination, reducing defect turnaround time from one week to 1-2 days through proactive first-level analysis
Developed an automation framework using Shell & Python, significantly reducing manual efforts in VNF deployment lifecycle and QL5 qualification
Automated VNF workflow executions via ROBOT framework, improving efficiency by cutting manual effort by several hours
Designed an integration framework using Ansible, automating the integration of SDL VNF with NetAct (NMS), reducing manual qualification from per release to weekly
Developed test plans, wrote test cases, and managed them in HP QC based on Functional Design (FD) specifications
Implemented CI/CD automation pipelines, reducing deployment time for VNFs from 3 hours to 30 minutes
Automated VNF deployment lifecycle using Python & Ansible, increasing efficiency by 40%
Implemented performance monitoring & log aggregation using ELK & Prometheus
Designed chaos engineering tests to improve resilience & failure tolerance in microservices
Nokia Networks
R&D Engineer - NetAct
12.2014 - 03.2019
Job overview
Played the role of Installation lead for NetAct and SPOC for Test Automation
Created a one-click installation automation tool for installing NetAct where it eliminated the need for document for installation via MENU driven approach
It reduced the manual efforts by around 1 hour and removed the risk of human errors while executing the instructions
Introduced and updated error reporting as many of the scripts lagged in the logging part making it difficult while debugging any issue
This reduced the debugging time
Automation of test cases done via ROBOT framework
Actively taken part in the code review, code walkthrough, discussions, and team mentorship
Resolved internal issues and customer queries/tickets during NetAct deployments, upgrades, and migration
Visited the customer site from R&D to support Technical support team during an outage caused while upgrade
Mindtree
Senior Test Engineer - NetBackup
11.2012 - 11.2014
Job overview
Automated all the Sanity cases using Python and executed test cases manually from Cycle 0 and pre-flight checklist till Regression cycle, bringing out defects majorly in backup/recovery of Linux machines to/from a tape device (virtual)
Defined the qualification of the PARTIAL GREEN build and predicted failures at very initial stage
This improved the turnaround time for build qualification by 3hrs from 8hrs
Involved in testing of deduplication and optimization of Pure Disk Dedupe Engine (PDDE) along with fingerprint calculation
Automated most of the test cases using bash shell scripts