Engineering Manager - Site Reliability Engineering
Bengaluru
Summary
Technically inclined professional offering an eventful career of over 14 years studded with professional brilliance predominantly in the areas of Site Reliability Engineering, customer focus, creativity, passion, industry experience, and leadership skills.
Overview
14
14
years of professional experience
3
3
years of post-secondary education
3
3
Certifications
Work History
Engineering Manager, SRE
Sophos
Bangalore
3 2021 - Current
Designed and implemented a comprehensive technical roadmap for the SRE team, aligning with the organization's objectives and business needs. This roadmap included setting short-term and long-term goals, identifying key projects, and establishing a clear vision for enhancing system reliability and performance.
Successfully established Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets for critical applications and services. Setup process to monitor, measure, and report on these metrics to ensure that systems are meeting their reliability targets and the team takes necessary actions when deviations occur.
Successfully designed and implemented an in-house DORA (DevOps Research and Assessment) solution, enabling comprehensive measurement and analysis of key DevOps metrics, including lead time, deployment frequency, change failure rate, and MTTR, across diverse platforms.
As a SRE Manager, spearhead the team's swift resolution of critical incidents (P1/P2) and orchestrate effective post- incident reviews while proactively identifying and addressing root causes to enhance incident response procedures for continuous improvement.
Empower collaborative synergy across teams while maintaining transparent, timely communication on system reliability, SLOs, and ongoing initiatives.
Establish and lead a new SRE team, ensuring that the team is structured for success with the right mix of skills and experience.
Provide direction, mentorship, and support to team members to foster a culture of reliability and continuous improvement.
Develop job descriptions, conduct interviews, and assess candidates to ensure the team is staffed with skilled individuals who can contribute to the team's goals.
Identify and implement automation opportunities in the SRE practice, encompassing everything from incident response to routine maintenance tasks, leveraging suitable tools and frameworks to enhance operational efficiency and minimize manual efforts.
Leading a team of 15+ SRE & DevOps engineers responsible for the end to end management of highly scaleable healthcare platforms, focused on ensuring enterprise grade availability, reliability, performance,CI/CD , security and other important aspects.
Leading communications of the SRE , DevOps & Migrations program globally, including town halls and Executive presentations to increase adoption and interest of SRE ,DevOps & migrations.
Designing, Planning and Migrating of applications/workloads from On-Prem to AWS Cloud.
Adoption of best AWS Governance practices to ensure cost control, security, and compliance.
Review current CI/CD process and refresh it as per best DevOps practices.
In the SRE space, transformed two critical platforms to a self-serviced and SRE model with end to end monitoring, self-healing, and defined SLIs & SLOs; onboarded 10+ application to a SRE Governance platform.
Leverage tooling and automation to the betterment of cloud infrastructure and resiliency & helps in company to save on cost which results in reduced OpEx.
Reduce MTTD,MTTR and MTTBF by 60% using end to end monitoring coverage & AIOps based event management.
Persistent work with partners & third party vendors to evaluate and implement the right technology within SRE/DevOps/Cloud Org.
Solely responsible for the design, implementation, and maintenance of all AWS infrastructure and services within a managed service environment.
Part of the team responsible for the Building, configuring and maintenance of cloud infrastructure for new and existing clients.
Monitoring and updating our infrastructure to ensure it stays secure and scales with the company.
Deploying and managing configuration management infrastructure on AWS, using CloudFormation, Ansible, and Terraform.
Leads project teams and work with all stakeholders responsible for all stages of design and development for complex products and platforms, including solution design, analysis, coding, testing, and integration.
Reviews and evaluates designs and project activities for compliance with systems design and development guidelines and standards; provides tangible feedback to improve product quality and mitigate failure risk.
Planning and implementation of Monitoring tools across the client data center including Nagios, Net flow Analyzer, Solar winds and Appneta.
Sr. System Engineer
Ubona Technology Pvt. Ltd.
Bangalore
12.2015 - 10.2016
Managing core Linux infrastructure including DHCP, DNS, MySQL, Apache web server and http proxy services for enterprise applications.
Provide technology leadership within the larger organization – propose technology roadmaps, prototype ideas and map them to projects.
Manage all production system and recommend ways to optimize performance and provide solution to problems and prepare reports for all problems.
Engage in and improve the whole lifecycle of services such as configuration management, deployment and monitoring, performance, automation, continuous Integration and version Control.
Systems Engineer
Amazon.com
Bangalore
06.2014 - 11.2015
Lead a team of 5 members of junior techs to provide IT support to end users and meet the SLA.
Managing core Linux infrastructure for hosting all the development deployment and infrastructure applications.
Collaborating with Business team to maintain a deployment and management strategy for Linux and Network infrastructure.
Managing Lab network infrastructure for business team with 100% uptime, this includes designing the network with high availability.
IT Engineer II
Stoke Networks Pvt. Ltd.
Bangalore
08.2010 - 06.2014
Planning, configuration and administration of Stoke India Linux servers, Juniper Firewall and Lab Network.