Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Avkash Kumar

Engineering Manager - Site Reliability Engineering
Bengaluru

Summary

Technically inclined professional offering an eventful career of over 14 years studded with professional brilliance predominantly in the areas of Site Reliability Engineering, customer focus, creativity, passion, industry experience, and leadership skills.

Overview

14
14
years of professional experience
3
3
years of post-secondary education
3
3
Certifications

Work History

Engineering Manager, SRE

Sophos
Bangalore
3 2021 - Current
  • Designed and implemented a comprehensive technical roadmap for the SRE team, aligning with the organization's objectives and business needs. This roadmap included setting short-term and long-term goals, identifying key projects, and establishing a clear vision for enhancing system reliability and performance.
  • Successfully established Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets for critical applications and services. Setup process to monitor, measure, and report on these metrics to ensure that systems are meeting their reliability targets and the team takes necessary actions when deviations occur.
  • Successfully designed and implemented an in-house DORA (DevOps Research and Assessment) solution, enabling comprehensive measurement and analysis of key DevOps metrics, including lead time, deployment frequency, change failure rate, and MTTR, across diverse platforms.
  • As a SRE Manager, spearhead the team's swift resolution of critical incidents (P1/P2) and orchestrate effective post- incident reviews while proactively identifying and addressing root causes to enhance incident response procedures for continuous improvement.
  • Empower collaborative synergy across teams while maintaining transparent, timely communication on system reliability, SLOs, and ongoing initiatives.
  • Establish and lead a new SRE team, ensuring that the team is structured for success with the right mix of skills and experience.
  • Provide direction, mentorship, and support to team members to foster a culture of reliability and continuous improvement.
  • Develop job descriptions, conduct interviews, and assess candidates to ensure the team is staffed with skilled individuals who can contribute to the team's goals.
  • Identify and implement automation opportunities in the SRE practice, encompassing everything from incident response to routine maintenance tasks, leveraging suitable tools and frameworks to enhance operational efficiency and minimize manual efforts.
  • Technology Stack : Azure | Terraform | DataDog | Prometheus | Github Actions | Python | Docker | Azure App Services | Kubernetes

Manager - Site Reliability Engineering

Nextgen Healthcare Pvt. Ltd
Bangalore
05.2019 - 03.2021
  • Leading a team of 15+ SRE & DevOps engineers responsible for the end to end management of highly scaleable healthcare platforms, focused on ensuring enterprise grade availability, reliability, performance,CI/CD , security and other important aspects.
  • Leading communications of the SRE , DevOps & Migrations program globally, including town halls and Executive presentations to increase adoption and interest of SRE ,DevOps & migrations.
  • Designing, Planning and Migrating of applications/workloads from On-Prem to AWS Cloud.
  • Adoption of best AWS Governance practices to ensure cost control, security, and compliance.
  • Review current CI/CD process and refresh it as per best DevOps practices.
  • In the SRE space, transformed two critical platforms to a self-serviced and SRE model with end to end monitoring, self-healing, and defined SLIs & SLOs; onboarded 10+ application to a SRE Governance platform.
  • Leverage tooling and automation to the betterment of cloud infrastructure and resiliency & helps in company to save on cost which results in reduced OpEx.
  • Reduce MTTD,MTTR and MTTBF by 60% using end to end monitoring coverage & AIOps based event management.
  • Persistent work with partners & third party vendors to evaluate and implement the right technology within SRE/DevOps/Cloud Org.
  • Technology Stack : AWS | Terraform | DataDog | Sumologic | BitBucket | Jenkins | Ansible | SonarQube | Tenable

Lead SRE Engineer

Harman Connected Services Pvt. Ltd.
Bangalore
12.2016 - 05.2019
  • Solely responsible for the design, implementation, and maintenance of all AWS infrastructure and services within a managed service environment.
  • Part of the team responsible for the Building, configuring and maintenance of cloud infrastructure for new and existing clients.
  • Monitoring and updating our infrastructure to ensure it stays secure and scales with the company.
  • Deploying and managing configuration management infrastructure on AWS, using CloudFormation, Ansible, and Terraform.
  • Leads project teams and work with all stakeholders responsible for all stages of design and development for complex products and platforms, including solution design, analysis, coding, testing, and integration.
  • Reviews and evaluates designs and project activities for compliance with systems design and development guidelines and standards; provides tangible feedback to improve product quality and mitigate failure risk.
  • Planning and implementation of Monitoring tools across the client data center including Nagios, Net flow Analyzer, Solar winds and Appneta.

Sr. System Engineer

Ubona Technology Pvt. Ltd.
Bangalore
12.2015 - 10.2016
  • Managing core Linux infrastructure including DHCP, DNS, MySQL, Apache web server and http proxy services for enterprise applications.
  • Provide technology leadership within the larger organization – propose technology roadmaps, prototype ideas and map them to projects.
  • Manage all production system and recommend ways to optimize performance and provide solution to problems and prepare reports for all problems.
  • Engage in and improve the whole lifecycle of services such as configuration management, deployment and monitoring, performance, automation, continuous Integration and version Control.

Systems Engineer

Amazon.com
Bangalore
06.2014 - 11.2015
  • Lead a team of 5 members of junior techs to provide IT support to end users and meet the SLA.
  • Managing core Linux infrastructure for hosting all the development deployment and infrastructure applications.
  • Collaborating with Business team to maintain a deployment and management strategy for Linux and Network infrastructure.
  • Managing Lab network infrastructure for business team with 100% uptime, this includes designing the network with high availability.

IT Engineer II

Stoke Networks Pvt. Ltd.
Bangalore
08.2010 - 06.2014
  • Planning, configuration and administration of Stoke India Linux servers, Juniper Firewall and Lab Network.
  • Managing corporate network and infrastructure DHCP, DNS, FTP, NFS, NTP server.
  • Worked on VMware ESXI freeware server. Virtualized around 50 servers for testing and each servers is running 5 instances.
  • Design corporate wireless network using WPA2 authentication.

Education

Bachelor of Technology - Information Technology

Jaipur Engineering College
Jaipur
06.2007 - 07.2010

Skills

Cloud - AWS Cloud

Certification

Red hat certified system administrator

Timeline

Manager - Site Reliability Engineering

Nextgen Healthcare Pvt. Ltd
05.2019 - 03.2021

Lead SRE Engineer

Harman Connected Services Pvt. Ltd.
12.2016 - 05.2019

Sr. System Engineer

Ubona Technology Pvt. Ltd.
12.2015 - 10.2016

Systems Engineer

Amazon.com
06.2014 - 11.2015

IT Engineer II

Stoke Networks Pvt. Ltd.
08.2010 - 06.2014

Bachelor of Technology - Information Technology

Jaipur Engineering College
06.2007 - 07.2010

Engineering Manager, SRE

Sophos
3 2021 - Current
Avkash KumarEngineering Manager - Site Reliability Engineering