Summary
Overview
Work History
Education
Skills
Timeline
Generic
Gaurav Singh

Gaurav Singh

Gurugram

Summary

Experienced Production Support Engineer/SRE /Devops Engineer with 5 years of expertise in managing and optimizing Kubernetes and AWS cloud environments. Proficient in monitoring, incident management, root cause analysis (RCA) and troubleshooting complex production issues to ensure 99% system uptime. Strong background in applying ITIL frameworks for efficient incident and problem resolution, consistently meeting SLA targets and reducing downtime by 30%

Overview

5
5
years of professional experience

Work History

Sr Site reliability Engineer (SRE)

Myndtree Business Services Pvt. Ltd. (Client: Airtel International LLP)
Gurugram
11.2024 - 04.2025

Mobile E-Wallet Platform – Airtel Money (500M+ users)
Offering services like money transfers, bill payments, and digital banking.

  • Diagnosed P1/P2 issues via system monitoring, reducing downtime by 30%.
  • Designed infra solutions with Prometheus, Grafana, and Loki for full-stack observability.
  • Deployed 50+ monthly hotfixes/releases using CI/CD tools (Jenkins, Git, Bitbucket).
  • Automated manual task using Ansible and managed Kubernetes clusters and Docker containers.
  • Automated deployment and maintenance tasks for on-premises environments using Ansible, Shell scripting, and Jenkins, reducing manual intervention and improving reliability.
  • Led migration of workloads from on-prem to cloud, resulting in cost optimization and improved performance
  • Handled Kubernetes YAML manifests (pods, services, deployments).
  • Built Jenkins pipelines (declarative/scripted) for zero-downtime deployments and SIT environments.
  • Integrated Jenkins with Docker and Kubernetes; used AWS CodePipeline suite.
  • Resolved Kubernetes pod/service issues, boosting app stability and uptime.
  • Maintained 98% SLA compliance on 100+ tickets/month, improving customer satisfaction by 20%.
  • Documented release, config, and monitoring processes, reducing recurring issues by 30%.
  • Automated deployments, enhancing operational efficiency and reducing release time.

Engineer

Ericsson India Global Services Pvt. Ltd.
Noida
03.2023 - 08.2024

EWP – Financial Services Platform
Provided financial services like money transfers, bill payments, and banking.

  • Ensured 24/7 availability of 50+ critical applications with 99%+ uptime.
  • Troubleshot major incidents (network/system failures), reducing downtime by 30%.
  • Managed and supported on-premises infrastructure, including servers, storage, and networking components for high availability and performance.
  • Configured and maintained on-prem monitoring and alerting tools (e.g., Grafana Prometheus) to ensure system health and uptime.
  • Implemented patching and security hardening on on-prem environments.
  • Troubleshot performance issues and system failures on on-premises applications, reducing downtime by 40%.
  • Collaborated with Dev, QA, and DevOps teams, improving incident resolution time by 40%.
  • Conducted RCA and documented fixes, cutting repeat issues by 15%.
  • Optimized Prometheus, Grafana, and Loki for faster failure detection (35% improvement).
  • Handled 150+ tickets/week via ServiceNow, maintaining 98% SLA compliance.
  • Applied config changes, menu updates, and patches on EWP4 & Kubernetes.
  • Executed Oracle deployments and RHEL patching with 100% success rate.

Product Support Engineer

Prosoft Technology
Noida
01.2020 - 03.2023
  • Managed system configurations, application restarts, and performance validation.
  • Handled ticket lifecycle and client communication.
  • Monitored server health and performance using monitoring tools.
  • Configured and managed network settings in Unix and Linux systems.
  • Fixed customers' technical queries with log analysis.
  • Tracked and owned all incident and problem management work through a ticketing system.
  • Monitored Memory, CPU, and Disk utilization services.
  • Managed incident tickets using ServiceNow, handling tickets per week ensuring resolution within SLA targets, and achieving 98% adherence rate.
  • Performed root cause analysis on network failures, contributing to a 20% improvement in long-term network reliability and performance.
  • Project: Telecom Banking (CDR-based transaction tracking system)

Education

B.Tech -

Babu Banarasi Das Institute of Technology
Ghaziabad
01.2020

12th -

BPS School
01.2016

10th -

BPS School
01.2014

Skills

  • Version Control: Git, GitHub
  • CI/CD Tools: Jenkins, GitHub Actions, Bitbucket
  • Containerization & Orchestration: Docker, Kubernetes
  • Configuration Management: Ansible
  • Monitoring & Logging: Prometheus, Grafana, Loki
  • Databases: MySQL, Oracle
  • Scripting: Shell (Bash)
  • Operating Systems: Linux (Ubuntu, CentOS), Windows
  • Issue Tracking: Jira
  • Cloud: AWS
  • IAC tool: Terraform

Timeline

Sr Site reliability Engineer (SRE)

Myndtree Business Services Pvt. Ltd. (Client: Airtel International LLP)
11.2024 - 04.2025

Engineer

Ericsson India Global Services Pvt. Ltd.
03.2023 - 08.2024

Product Support Engineer

Prosoft Technology
01.2020 - 03.2023

B.Tech -

Babu Banarasi Das Institute of Technology

12th -

BPS School

10th -

BPS School
Gaurav Singh