Summary
Overview
Work History
Education
Skills
Accomplishments
Interests
Additional Information
Timeline
Generic

KP Manjunath

Devops/SRE Senior Leader
Bengaluru

Summary

Versatile Devops/SRE Leader with with 14+ years of experience in building, managing,Architecting and supporting large scale diverse infrastructure as a part of senior leadership. Built and managed a very efficient geographically distributed Devops/SRE team from scratch. Extensive knowledge and hands on automation skills. Architected and executed multiple highly scalable resilient infrastructure from scratch with auto remediation flow/self healing of incidents. Exposure to end to end automation in Azure/AWS and GCP. OpenSource aficionado and eagerly contributed to many Opensource projects.

Overview

19
19
years of professional experience

Work History

Sr Solution Architect Devops and SRE

Verse Systems
03.2021 - Current
  • Managing and mentoring team of 30+ Devops/SRE ( Direct and indirect reportees )
  • Built advanced ML integrated setup for Anomaly detection and auto remediation using Opensource LLMs.
  • Participate in Budgeting and Cost saving opportunities
  • Responsible for Capex and Opex approvals and initiatives
  • Responsible for Josh application delivery, end to end production uptime and Devops and SRE team management
  • Incident handling and platform building for CI/CD Infosec and other operational excellency
  • Worked on cost saving initiative to save close to INR 75% cost per year
  • Automated and onboarded 370 applications in Kubernetes with fault tolerant auto scaling systems on AWS.
  • Handled peak traffic of 90K RPS
  • Built and architected state of art AI OPS platform, which is successfully handling 60 percent of repetitive production incidents automatically
  • Integrated platform with LB algorithm( Maglev ) with Envoy for GRPC load balancing on High performance Kubernetes environment ( Bright down 12 ms Response time to 4 ms mean RT )
  • Responsible for K8s platform with end to end AWS to Azure migration and Monitoring and HA setup for different envs.
  • Built an advanced integrated Monitoring and alerting platform with Service discovery and self service portal

Head of Network and Operations

Mi Sports Ltd
04.2019 - 03.2021
  • Saved 60% of cloud costs month on month with Micro Service enablement of all core projects ( Kubernetes ) and on demand auto scaling implementation using spot instances
  • Responsible for mentoring and managing a Devops and SRE team as well as incident handling and escalations Designed and built cloud automation tools and applications to deploy next generation platform ( Terraform, Ansible, Chef ).
  • Wrote code and supported architecture in high-throughput systems on AWS cloud. Worked closely with software development and testing team members to design and develop robust solutions to meet client requirements for functionality, scalability and performance.
  • Interfaced with cross-functional team of business analysts, developers and technical support professionals to determine comprehensive list of requirement specifications for new applications. Monitored automated build and continuous software integration process to drive build/release failure resolution. ( Graphite, Loki , Prometheus ) Configured, installed and tuned systems for performance in AWS cloud.
  • Worked closely with other business analysts, development teams and infrastructure specialists to

Devops/SRE Manager

Tesco Technology
07.2017 - 04.2019
  • Managed a geo-graphically distributed team of 15 Devops/SREs and owned the entire operations
  • Containerized , architected and cloud enabled around 23 Applications for Tesco on AWS with Mesos/Marathon and Kubernetes clusters. To reduce the operational cost by 48%
  • Designed and Coded rolling deployment, CI/CD and Infrastructure as Code using Terraform, Ansible and Jenkins build pipelines, which reduced the over all delivery timelines to 3 days from earlier 6 Weeks release cycle
  • Integrated Sonarcube and Junit driven automated builds for better code quality
  • Designed and executed Disaster recovery plan for a large scale Production applications on AWS ( Java + MongoDB + AWS FrankFurt and London as a failover )
  • Monitored and processed information for regulatory compliance, working closely with environmental engineers, advisors and other business units
  • Measured team performance and reported metrics to leadership team members
  • Designed and built Ansible, Terraform and other automation tools and applications to deploy next generation platform
  • Monitored automated build and continuous software integration process to drive build/release failure resolution
  • Automated and implemented backup and recovery procedures for Cloud based systems that resulted in 99.9999% uptime of environments and help development team to release on demand by reducing deployment life cycle over heads
  • Worked closely with software development and testing team members to design and develop robust integrated solutions to meet client requirements for functionality, scalability and performance
  • Wrote automated monitoring and self remediation platform code to auto remediate/resolve issues and create incident tickets and alerts automatically ( Using integration of Sensu, Python , Graphite and ELK stack )
  • Created and architected PCI compliance
  • Worked closely with other business analysts, development teams and infrastructure specialists to deliver high availability solutions for mission-critical applications

Engineering Manager PortalOps

Akamai Technologies
03.2015 - 07.2017
  • Managed a Devops team which was responsible for supporting 65,000 high traffic servers across the globe round the clock ( Bangalore, Poland and US )
  • Own the entire production environment of Luna portal. Responsible for incident handling , uptime and troubleshooting, Architecting and end to end SDLC
  • Architected, Coded and delivered high performance Cassandra cluster setup distributed across 3 DC across the globe, using Ansible playbooks
  • Developed/coded a realtime monitoring solution for Cassandra using JMX + JMX trans
  • Developed a monitoring and alerting platform ( SMART ), which integrates Sensu, Graphite, Grafana , Jira, Slack to auto remediates recurring issues and eliminates noise by flap detection
  • Introduced and executed 174 legacy applications to MicroServices and developed end to end CI/CD for the same with service discovery ( Mesos/ Marathon , Zookeeper )

Devops Manager

Praxeva India services Pvt ltd
07.2009 - 03.2015
  • Worked directly with OpsCode on Knife plugin management system for GE in US
  • Cloud enabled ( AWS ) one of the largest Telecom CAF application setup for most major telcos in India
  • Designed automated rollout and updates and CI/CD using Jenkins, Chef and Python
  • Coded a self service product demo SAAS setup ( user can fill a form and successful verification of Email it will provision multi tenant server cluster and deploy application automatically and customer can use it to get the feel of the application ). Backend:- Python + Chef + LXC containers
  • Reduced cloud costs by 52% using automation + On demand and fine tuning

Senior Linux Administrator

Tejas Networks
01.2006 - 01.2009
  • Managed a team of 5 for Tejas networks
  • Setup DRBD and HA setup for cross office communications
  • Implemented SVN and CI/CD solutions
  • Automated critical systems using Perl
  • Wrote a SLA monitoring dashboard using Python for Bugzilla
  • Implemented HA and DR solution for Mysql and Java application
  • Implemented Nagios monitoring and alerting
  • Implemented LXC container setup
  • Configured NIS and NFS service for Linux development environment

Education

BCA - Computers

IMTS
Banaglore
Jan 2000 - 01.2003

Skills

Automationundefined

Accomplishments

  • Built a SMART monitoring system which does auto End to end automation of docker + Mesos remediation on production failures ( 60 percent BAU Marathon , zookeeper setup for micro operations reduced and incidents averted ) services Integrated Sensu, Graphite, Grafana , ELK stack , Splunk and Stackstorm + Ansible for incident management and https://github.com/linuxmanju/azure- reporting as well as remediation terraform-mesos-cluster Introduced/Architected in memory file system and encryption for a major PCI compliance applications Auto Service discovery using Ansible and across globe Terraform on Azure End to end deployment and CI/CD pipelines using https://github.com/linuxmanju/terraform- Ansible with Roles and discovery azure-ansible Architected and successfully introduced micro service architecture for Akamai LUNA services ( Mesos, Chef LWP example code Marathon, Zookeeper and Docker ) https://github.com/linuxmanju/Chef-Dev Reporting to: Krishna Geenie Integration of Sensu + Flapjack + Docker orchestration https://github.com/linuxmanju/opstools Patched Marathon-lb to fix a nasty bug in service discovery https://github.com/linuxmanju/marathon-lb

Interests

Travelling

Reading

Learning new things

Additional Information

OpenSource/Community Highlights:-

  • Wrote an end to end large scale SAP cluster hanna deployment automation code for provisioning on Azure cloud using terraform ( 768 Resources in one go and 15 mins lead time for end to end deployment )
  • Wrote Log stash filter plugin using Ruby to do a key value mapping using Redis as a key value store
  • Automated Ansible playbook + Terraform for end to end ELK ( Elasticsearch, logstash and Kibana ) cluster + Python script to pool the data from SalesForce Events
  • Wrote a provisioning code and CI/CD pipeline for a touch less infrastructure and state + drift detection using terraform and ansible
  • Was part of Gentoo linux netfilter testing team
  • Contributed to Getmail
  • Frequent in FreeNode IRC and opensource forums

Timeline

Sr Solution Architect Devops and SRE

Verse Systems
03.2021 - Current

Head of Network and Operations

Mi Sports Ltd
04.2019 - 03.2021

Devops/SRE Manager

Tesco Technology
07.2017 - 04.2019

Engineering Manager PortalOps

Akamai Technologies
03.2015 - 07.2017

Devops Manager

Praxeva India services Pvt ltd
07.2009 - 03.2015

Senior Linux Administrator

Tejas Networks
01.2006 - 01.2009

BCA - Computers

IMTS
Jan 2000 - 01.2003
KP ManjunathDevops/SRE Senior Leader