Thorough quality assurance professional highly skilled in manual and automated application testing. Partners successfully with project management and development team members to put out high-quality software for customers. Decisive in identifying problems at any stage of production.
Currently working as a Site Reliability Engineer, focus on maintaining and improving the reliability, availability, and performance of software systems.
· Monitoring application logs thoroughly in PreProd and Prod environments using Splunk by following appropriate strategies.
Monitoring application performance through AppDynamics and reporting the issues to the right stakeholders.
· Create strategies and automation to detect issues, report them with appropriate stakeholders.
· Write and review post-mortems; conduct review meetings on the root cause analysis of the incidents.
· Identify risks by collaborating with both development teams and other key stakeholders.
· Once risks are identified, analyse and evaluate potential impact and likelihood of occurrence.
· Implement various risk mitigation strategies to mitigate operational risks. Once done, continuously monitor and review the effectiveness of the risk strategies.
· Maintain system reliability and ensure a best user experience.
· Study historical trends in terms of performance by preparing Operations metrics.
· Learn from incidents, not to repeat them by entre team.
· Automation of processes and creating dashboards.
· Enhancing efficiency and reliability across infrastructure.
· Identifying recurring patterns of issues or opportunities and automate them using various tools and techniques.
· Reviewing customer experience through Glassbox, identify the issues and root cause analysis and improve the system stability and reliability.
· Participate in MIM calls for troubleshooting, sharing impact assessment and identifying the root cause.
· Release schedule planning and management.