Summary
Overview
Work History
Skills
Open Source Contributions
Websites
Timeline
Generic
Leegin Bernads

Leegin Bernads

Bangalore

Summary

I'm a DevOps Engineer with 9 years of hands-on experience in architecting, automating, and optimizing mission-critical deployments over extensive infrastructure. I enjoy connecting people with technology solutions that are easy to use, affordable, and sustainable over time.

Overview

9
9
years of professional experience

Work History

DevOps Engineer

Groww
Bangalore
05.2021 - Current
  • Designed and architected a scalable, fault-tolerant observability stack using OpenTelemetry, Prometheus, and Grafana Mimir to handle over 120 Million active time series. Implemented robust guardrails on metrics and labels to prevent cardinality explosion, ensuring high performance, availability, and operational efficiency.
  • Led the design and management of a distributed tracing system, leveraging OpenTelemetry Agent and Collector, Refinery, and Grafana Tempo. Implemented head and tail-based sampling strategies to capture high-value traces, significantly improving issue detection and debugging efficiency. Enabled identification of 'unknown unknowns' and reduced Mean Time to Recovery (MTTR) by approximately 40%.
  • Architected and deployed a production-grade centralized logging system using Promtail and Grafana Loki, ingesting over 6 TB of logs daily. Implemented log-level-based segregation using regex matching and Promtail pipelines to enrich logs with Kubernetes metadata for efficient querying. Introduced log sampling based on log levels to optimize storage and enhance the signal-to-noise ratio, contributing to faster incident resolution and a measurable reduction in MTTR.
  • Developed a high-cardinality metric analysis service to track metric usage across multi-tenant environments. The service evaluates whether metrics are referenced in Grafana dashboards, alerting/recording rules, or queried within the last 30 days. This enabled the automatic classification of metrics as used or unused, improving observability hygiene, reducing noise, and optimizing resource utilization.
  • Architected and deployed a proprietary event-driven tool, named Heimdall, to enforce comprehensive labeling compliance policies for GCP resources across the organization. This solution effectively addressed the company's challenge of billing segregation. As a result of its success, plans are underway to release Heimdall as an open-source tool for wider adoption within the GCP cloud user community.
  • Implemented event-driven autoscaling using KEDA for applications, significantly enhancing scalability capabilities, while simultaneously reducing infrastructure costs.

DevOps Engineer

OLA, ANI Technologies Pvt. Ltd
Bangalore
10.2018 - 04.2021
  • Saved $45.5K/month by optimizing underutilized EC2, ELBs, and ElastiCache resources.
  • Automated vendor IP whitelisting in Azure NSGs, cutting 20 mins per request.
  • Improved HA for Flink jobs, saving ~90 mins/week by automating alert-based recovery.
  • Led Kafka & Zookeeper operations at scale, including mirroring and tuning petabyte-scale data pipelines.

Senior Hosting product specialist

Endurance International Group
Bangalore
12.2017 - 09.2018

Linux Administrator

Poornam Info Vision
Kochi
05.2016 - 12.2017

Skills

  • CI/CD pipelines
  • Infrastructure automation
  • Root cause analysis
  • Container orchestration
  • Technical documentation
  • Incident management
  • Distributed tracing
  • Log management
  • Cost optimization
  • Problem-solving abilities
  • Apache Kafka
  • Maintenance and troubleshooting

Open Source Contributions

  • Identified and fixed a critical memory inefficiency in Prometheus where the /service-discovery endpoint caused a 2× memory spike due to discovered labels being stored as key-value maps. Optimized the label serialization by compacting them into strings, reducing memory allocation by ~50% and improving endpoint latency, validated via benchmarks. Officially became a Prometheus contributor; fix is included in the upcoming release. PRs: #13469, #13484
  • Played a pivotal role in the open-source community, specifically with Grafana Labs, by spearheading the detailed migration plan from Thanos to Grafana Mimir. Led the successful completion of this migration, making our company the first to accomplish it, while personally contributing to every aspect of the project. https://grafana.com/docs/mimir/latest/set-up/migrate/migrate-from-thanos-or-prometheus/
  • Explored Argo Rollouts' analysis feature to enable progressive delivery based on real-time metrics retrieved from a TSDB. Identified discrepancies in the official documentation regarding custom header usage for Prometheus long-term storage queries. After reviewing the source code, corrected the documentation by submitting a PR to Argo Rollouts, ensuring accurate implementation guidance for the community. PR: #3306
  • Following a Prometheus memory issue investigation, implemented safeguards by enforcing a limit on dropped targets post-relabeling. While Prometheus natively supported this, the configuration option was missing in the Prometheus Operator’s Helm chart. After identifying the correct parameter in the codebase, contributed a Helm chart enhancement to expose this setting, enabling better memory control for operator-managed clusters. PR: #4178
  • Contributed to the open-source community by developing modules for the GCP labels mod feature in the Steampipe library

Timeline

DevOps Engineer

Groww
05.2021 - Current

DevOps Engineer

OLA, ANI Technologies Pvt. Ltd
10.2018 - 04.2021

Senior Hosting product specialist

Endurance International Group
12.2017 - 09.2018

Linux Administrator

Poornam Info Vision
05.2016 - 12.2017
Leegin Bernads