Summary
Overview
Work History
Education
Skills
Timeline
Generic

SHASHI KUMAR VADAKAPURAM

Hyderabad

Summary

Experienced Site Reliability Engineer - SWE with 4+ years of expertise in designing and developing Python and Go based tools and services. Skilled in managing production infrastructure, maintaining scalable distributed systems, and ensuring platform health.

I'm fascinated by the way technology can transform industries and improve lives, and that's what keeps myself motivated in this field. I've been fortunate enough to apply my passion and skills to building various platforms like Arcesium and Salesforce

Overview

5
5
years of professional experience

Work History

Senior Member of Technical Staff - DevOps

Salesforce
08.2024 - Current
  • Kafka Autoscaler (K8s operator) - Reduce cost to service (CTS): Led a team of 3 engineers in designing and implementing a Kubernetes-based operator for autoscaling Kafka, using Golang. This autoscaler manages both up scaling and down scaling based on disk usage by monitoring various metrics, without causing downtime (e.g., under-replicated partitions and offline partitions in kafka).
  • Managed a team of 10 members in ensuring continuous 24x7 production availability by implementing Agile practices and optimizing roster management. Achieved a notable 50% reduction in the weekly average Time to Resolution (TTR), (from 1 hour to 30 minutes.) Additionally, orchestrated a substantial 66% decrease in monthly Incident Volume(from 1500 to 500.)
  • Created a new process for upgrading EKS clusters versions to latest in every 3 months.

Member of Technical Staff - DevOps

Salesforce
06.2022 - 07.2024
  • Designed and developed a Go-based sidecar container for autoscaling Mirus(An open-source service responsible for replicating data from one kafka to another or s3) optimizing cost by 42%.
  • Experience in incident management, root cause analysis and post mortem reviews.
  • Managed Big Data pipelines (Kafka) across thousands of data centers in distributed environments (On-prem and AWS) on Kubernetes clusters.
  • Built monitoring Dashboards using Prometheus and Grafana.

Senior Reliability Engineer

Arcesium
01.2022 - 06.2022
  • Developed a Go -based self-service web portal for Production and Container Readiness Reviews, reducing wastage costs by 52%.
  • Implemented external monitoring using synthetic tests ( Terraform ).
  • Established CI/CD pipelines with Gitlab and Python-based workflows for change management.
  • Orchestrated cross-team SWAT meetings for incident review.

Reliability Engineer

Arcesium
06.2020 - 12.2021
  • Designed and developed an asynchronous Python Flask-based micro self-service for automated Breakglass access, saving manual effort and using PostgreSQL for the database.
  • Automated disaster recovery processes, enabling single-click recovery of affected services in different Availability Zones.
  • Developed a platform-stabilizing tool post-maintenance or disaster scenarios.
  • Managed monitoring systems including DataDog.

Reliability Engineer Intern

Arcesium
02.2020 - 06.2020

Education

Bachelor of Technology - Computer Science And Engineering

Mahatma Gandhi Institute of Technology
Hyderabad, India
05.2020

Skills

    Golang

    Python

    Bash/Linux

    Docker/Kubernetes

    AWS/Alicloud

    Ansible/Terraform

    Datadog/Grafana

    CI/CD/Jenkins

    On-call Management

    Incident Management

Timeline

Senior Member of Technical Staff - DevOps

Salesforce
08.2024 - Current

Member of Technical Staff - DevOps

Salesforce
06.2022 - 07.2024

Senior Reliability Engineer

Arcesium
01.2022 - 06.2022

Reliability Engineer

Arcesium
06.2020 - 12.2021

Reliability Engineer Intern

Arcesium
02.2020 - 06.2020

Bachelor of Technology - Computer Science And Engineering

Mahatma Gandhi Institute of Technology
SHASHI KUMAR VADAKAPURAM