Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Additional Information
Languages
Work Availability
References
Timeline
Generic

Rajalakshmi V

Bengaluru

Summary

Site Reliability Engineer with extensive experience in end-to-end Incident Response & Management (IR&M) and building resilient, planet-scale infrastructure. Skilled in delivering on-premises and cloud-native solutions within AWS environments, with strong expertise in Python development. Adept at collaborating with cross-functional, multicultural teams to drive reliability, scalability, and operational excellence

Overview

6
6
years of professional experience
2
2
Certification

Work History

Site Reliability Engineer II

Toast Inc.
Bengaluru
08.2023 - 08.2025
  • Led the first-of-its-kind migration of 600+ AWS microservices from Fluentd to Fluent Bit, designing and executing the process from scratch. Achieved at least a 50% improvement in CPU and memory efficiency, addressed critical security concerns, and enabled the use of more performant plugins-significantly enhancing overall logging efficiency.
  • Led organization-wide SLO implementation program fostering cross-functional collaboration to elevate service reliability.
  • Spearheaded Splunk cost optimization, developing cross-LOB/team attribution models aligned with FinOps principles.
  • Implemented AI-based incident management tooling and automation, streamlining incident response using FreHydrant.
  • Developed Observability as Code (OaC) solutions with Datadog and Terraform. This led to a reduction in MTTD by at least 75%.
  • Bootstrapped internal tuning advisor tool with a TypeScript front end.
  • Proficient in Python scripting for automation and tooling.
  • Provided comprehensive observability support, including on-call, developer query resolution, and feature enhancements.

Site Reliability Engineer

Linkedin
Bengaluru
02.2021 - 08.2023
  • Highly proficient in managing and operating one of the largest Kafka infrastructures on the planet. The message in-rate is close to 20 trillion messages per second.
  • Worked on improving and maintaining Kafka monitoring infrastructure to automatically map clusters to alert definitions. This reduced toil and engineer hours in inducting new Kafka clusters into the monitoring system, and used Flask and Vue.js for the same.
  • Migrating stateless applications from old to new clusters across multiple data centers. This removed the team's toil of updating hosts by offloading the maintenance of new clusters to a central team.
  • Have experience being on-call, handling customer queries/asks, and driving incident management.
  • Mentoring and conducting boot camp sessions for new joiners.

DevOps Engineer

OkCredit, Psi Phi Global Solutions Ltd
Bengaluru
05.2020 - 02.2021
  • Bootstrapping Kubernetes cluster components with Ansible. Involves templating (Jinja2) and creating roles for orchestrating the creation of cluster components in internal environments. Also involves writing Python and Shell scripts.
  • Implemented a Kubernetes-native monitoring, tracing, and alerting solution for the organization with Prometheus, Alertmanager, and Slack. Used Helm, operators, and CRDs.
  • Managing RBAC in Kubernetes for restricted access to developers. Used CloudFunction, Operators.
  • Managing and onboarding CI/CD of microservices. Involved in cost optimization of infrastructure resources.

Software Development Engineer

Innovaccer Inc
Noida
05.2019 - 05.2020
  • Implemented an automated, cost-effective alternative to several JIRA and Confluence plugins using AWS Lambda and API Gateway.
  • Creating and managing JIRA workflows, SLAs, and automation rules.
  • Created a unified, dynamic dashboard with Python for viewing AWS resources.
  • Automating deployments and setting up CI/CD pipelines with Ansible and Jenkins.

Education

B.Tech (with Hons.) - Computer Science

IIIT Sricity
06.2019

Skills

  • Kafka Administration
  • Terraform
  • AlertManager
  • Linux (Debian and RHEL)
  • Git
  • Grafana
  • Python
  • AWS
  • Datadog
  • Fire Hydrant
  • Kubernetes
  • Prometheus
  • Docker
  • Splunk

Certification

  • Certified Kubernetes Administrator (CKA), LF-Shngo9cbm
  • Certified Kubernetes Application Developer (CKAD), LF-ddzj4h6de

Accomplishments

  • Peer and Executive level appreciation at Toast - Received multiple tokens of appreciation from within and outside the team for outstanding collaboration, leadership of projects, and successful triaging of issues
  • Published an end-to-end module on containerization and orchestration for the School of SRE, LinkedIn: https://linkedin.github.io/school-of-sre/level102/containerization-and-orchestration/intro
  • Recognition of excellence (LinkedIn): received multiple 'Bravos' (tokens of appreciation) from LinkedIn for exceptional contributions to multiple projects
  • Recognition of excellence (Innovaccer), received the 'Spot Award' and a prize voucher from Innovaccer for excellent performance
  • Awarded the Institute Gold Medal for the highest CGPA in the batch of 2019
  • Dean's List of Merit As part of the top 10% of the highest scores in a semester (2015-2016)

Additional Information

Passionate about marathons (running 21K), endurance sports, recreational badminton, and gymming. Recognized as a Cult Champion for passion in fitness. I have experience organizing badminton tournaments and community runs

Languages

English
First Language

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

References

References available upon request.

Timeline

Site Reliability Engineer II

Toast Inc.
08.2023 - 08.2025

Site Reliability Engineer

Linkedin
02.2021 - 08.2023

DevOps Engineer

OkCredit, Psi Phi Global Solutions Ltd
05.2020 - 02.2021

Software Development Engineer

Innovaccer Inc
05.2019 - 05.2020

B.Tech (with Hons.) - Computer Science

IIIT Sricity
Rajalakshmi V