Leegin Bernads

Summary

I'm a DevOps Engineer with 9 years of hands-on experience in architecting, automating, and optimizing mission-critical deployments over extensive infrastructure. I enjoy connecting people with technology solutions that are easy to use, affordable, and sustainable over time.

Overview

9

years of professional experience

Work History

DevOps Engineer

Groww

Bangalore

05.2021 - Current

Designed and architected a scalable, fault-tolerant observability stack using OpenTelemetry, Prometheus, and Grafana Mimir to handle over 120 Million active time series. Implemented robust guardrails on metrics and labels to prevent cardinality explosion, ensuring high performance, availability, and operational efficiency.
Led the design and management of a distributed tracing system, leveraging OpenTelemetry Agent and Collector, Refinery, and Grafana Tempo. Implemented head and tail-based sampling strategies to capture high-value traces, significantly improving issue detection and debugging efficiency. Enabled identification of 'unknown unknowns' and reduced Mean Time to Recovery (MTTR) by approximately 40%.
Architected and deployed a production-grade centralized logging system using Promtail and Grafana Loki, ingesting over 6 TB of logs daily. Implemented log-level-based segregation using regex matching and Promtail pipelines to enrich logs with Kubernetes metadata for efficient querying. Introduced log sampling based on log levels to optimize storage and enhance the signal-to-noise ratio, contributing to faster incident resolution and a measurable reduction in MTTR.
Developed a high-cardinality metric analysis service to track metric usage across multi-tenant environments. The service evaluates whether metrics are referenced in Grafana dashboards, alerting/recording rules, or queried within the last 30 days. This enabled the automatic classification of metrics as used or unused, improving observability hygiene, reducing noise, and optimizing resource utilization.
Architected and deployed a proprietary event-driven tool, named Heimdall, to enforce comprehensive labeling compliance policies for GCP resources across the organization. This solution effectively addressed the company's challenge of billing segregation. As a result of its success, plans are underway to release Heimdall as an open-source tool for wider adoption within the GCP cloud user community.
Implemented event-driven autoscaling using KEDA for applications, significantly enhancing scalability capabilities, while simultaneously reducing infrastructure costs.

DevOps Engineer

OLA, ANI Technologies Pvt. Ltd

Bangalore

10.2018 - 04.2021

Saved $45.5K/month by optimizing underutilized EC2, ELBs, and ElastiCache resources.
Automated vendor IP whitelisting in Azure NSGs, cutting 20 mins per request.
Improved HA for Flink jobs, saving ~90 mins/week by automating alert-based recovery.
Led Kafka & Zookeeper operations at scale, including mirroring and tuning petabyte-scale data pipelines.

Senior Hosting product specialist

Endurance International Group

Bangalore

12.2017 - 09.2018

Linux Administrator

Poornam Info Vision

Kochi

05.2016 - 12.2017

Skills

CI/CD pipelines
Infrastructure automation
Root cause analysis
Container orchestration
Technical documentation
Incident management

Distributed tracing
Log management
Cost optimization
Problem-solving abilities
Apache Kafka
Maintenance and troubleshooting

Open Source Contributions

Identified and fixed a critical memory inefficiency in Prometheus where the /service-discovery endpoint caused a 2× memory spike due to discovered labels being stored as key-value maps. Optimized the label serialization by compacting them into strings, reducing memory allocation by ~50% and improving endpoint latency, validated via benchmarks. Officially became a Prometheus contributor; fix is included in the upcoming release. PRs: #13469, #13484
Played a pivotal role in the open-source community, specifically with Grafana Labs, by spearheading the detailed migration plan from Thanos to Grafana Mimir. Led the successful completion of this migration, making our company the first to accomplish it, while personally contributing to every aspect of the project. https://grafana.com/docs/mimir/latest/set-up/migrate/migrate-from-thanos-or-prometheus/
Explored Argo Rollouts' analysis feature to enable progressive delivery based on real-time metrics retrieved from a TSDB. Identified discrepancies in the official documentation regarding custom header usage for Prometheus long-term storage queries. After reviewing the source code, corrected the documentation by submitting a PR to Argo Rollouts, ensuring accurate implementation guidance for the community. PR: #3306
Following a Prometheus memory issue investigation, implemented safeguards by enforcing a limit on dropped targets post-relabeling. While Prometheus natively supported this, the configuration option was missing in the Prometheus Operator’s Helm chart. After identifying the correct parameter in the codebase, contributed a Helm chart enhancement to expose this setting, enabling better memory control for operator-managed clusters. PR: #4178
Contributed to the open-source community by developing modules for the GCP labels mod feature in the Steampipe library

Websites

Timeline

DevOps Engineer

Groww

05.2021 - Current

DevOps Engineer

OLA, ANI Technologies Pvt. Ltd

10.2018 - 04.2021

Senior Hosting product specialist

Endurance International Group

12.2017 - 09.2018

Linux Administrator

Poornam Info Vision

05.2016 - 12.2017

Summary

Overview

Work History

DevOps Engineer

DevOps Engineer

Senior Hosting product specialist

Linux Administrator

Skills

Open Source Contributions

Websites

Timeline

DevOps Engineer

DevOps Engineer

Senior Hosting product specialist

Linux Administrator

Similar Profiles

Casey AbbottCasey Abbott

Shalini LathShalini Lath

Biswanath Rajendra Kumar AcharyaBiswanath Rajendra Kumar Acharya

Kavish KumarKavish Kumar

Ezekiel BoatengEzekiel Boateng