Summary
Overview
Work History
Education
Timeline
Generic

Thrinadh Kumpatla

Site Reliability Engineer -II
Hyderabad

Summary

Accomplished Site Reliability Engineer with a proven track record at Rudderstack, adept in Kubernetes and demonstrating exceptional problem-solving skills. Successfully led migrations, optimized AWS costs, and enhanced security and CI/CD workflows. Showcased ability to improve system reliability and efficiency, significantly reducing incident resolution times.

Overview

7
7
years of professional experience
4
4
years of post-secondary education

Work History

Site Reliability Engineer - II

Rudderstack
5 2021 - Current
  • Led project to centralize monitoring data around 50million time-series and alerting across logs, applications, and cluster-level metrics using Loki, Thanos, and Grafana Mimir. Implemented customers to add alerts via web application to monitor health of data pipelines.
  • Migrated Load Balancing from ALB for each customer to Traefik using single NLB LB as an effort to improve observability and cost, Implemented custom domain support for customers, which enhanced product value.
  • Reduced AWS costs significantly by optimizing node utilization for workloads through the use of spot instances, Karpenter, Nginx Ingress Controller, Traefik, and other cost-saving techniques like scaling on custom metrics using KEDA. Significantly imporved margins from 57% to 80%.
  • Played a key role in migrating from the TICK stack to Prometheus, improving monitoring capabilities.
  • Enhanced security posture by moving secrets to Vault, onboarding Snyk for vulnerability management, and providing developers access through code.
  • Migrated from CodeBuild to GitHub Actions, achieving a high degree of customization and leveraging plugins for enhanced CI/CD workflows.
  • Proactively responded to on-call requests, resolved infrastructure-related issues, led outage resolution efforts, and conducted root cause analyses to prevent future incidents. Introduced an incident management protocol to streamline team collaboration and speed up resolution.
  • Established and implemented ArgoCD for continuous delivery of cluster-level applications, operators, and other components.
  • Implemented SLOs and a Status Page to keep customers informed about ingestion availability, API status, and service latencies/outages.

DevOps Engineer

Techolution
6 2019 - 5 2021
  • Project: PoC for Migration to GCP
  • Gained hands-on experience with cloud technologies and Kubernetes through various POCs.
  • Product: Faceopen (faceopen.com)
  • Managed the deployment of a 15-microservice admin portal on Docker, utilizing Node.js, Angular, Python, MongoDB, and Redis.
  • Designed a SaaS architecture to support multi-tenancy on Kubernetes (GKE) with NFS storage for image management and Redis for efficient data handling.
  • Optimized facial recognition performance, reducing latency from 100ms to 15ms using GCP GPU instances.
  • Developed the marketing website and voice application with CI/CD integration on GCP, implementing pipelines using GitHub Actions and Jenkins for rapid and reliable deployments.
  • Completed the Google Cloud Professional Architect certification while at Techolution.

Intern

verzeo
10.2017 - 01.2018

Education

B.E(HONS) - Electrical, Electronics And Communications Engineering

Birla Institute of Technology And Science
Hyderabad
08.2015 - 05.2019

Timeline

Intern

verzeo
10.2017 - 01.2018

B.E(HONS) - Electrical, Electronics And Communications Engineering

Birla Institute of Technology And Science
08.2015 - 05.2019

Site Reliability Engineer - II

Rudderstack
5 2021 - Current

DevOps Engineer

Techolution
6 2019 - 5 2021
Thrinadh KumpatlaSite Reliability Engineer -II