Summary
Overview
Work History
Education
Skills
Tools & Technologies
Training
Timeline
Generic

VIVEK KODAGALI

Summary

Dynamic Site Reliability Engineer with extensive experience at Walmart, excelling in incident management and VM migration. Proficient in AWS , Azure and Terraform, I enhanced system uptime and streamlined alerting systems. Adept at leveraging machine learning for proactive issue resolution, I thrive in high-pressure environments, ensuring compliance and operational excellence.

Overview

7
7
years of professional experience

Work History

Site Reliability Operations / NOC Engineer

Walmart
Bangalore
02.2022 - Current
  • Developed and managed alerting systems using YAML configuration files.
  • Created and monitored Grafana dashboards using MMS Query for real-time data visualization.
  • Implemented machine learning-based alerts to enhance anomaly detection and proactive issue resolution.
  • Performed in-depth log analysis using tools such as Splunk, Dynatrace, and OpenObserve.
  • Migrated alerting systems from Splunk to OpenObserve, ensuring consistent monitoring during platform transition.
  • Initiated incident response and pageouts via XMatters, reducing MTTR (Mean Time to Recovery).
  • Handled P1/P2 incidents, including call scheduling and coordination during critical outages affecting payment gateways and core banking systems.
  • Supported DevOps operations for core banking and financial applications, ensuring high system uptime, transaction integrity, and regulatory compliance.
  • Conducted VM migrations between OneOps environments and from OneOps to WCNP (Walmart Cloud Native Platform).
  • Gained hands-on experience with Kubernetes, contributing to containerized application management and deployment strategies.
  • Performed POD scaling operations using DX Console, supporting dynamic application load requirements.
  • Used Helm to manage Kubernetes applications, streamlining deployment, versioning, and rollback of services in various environments.
  • Customized Helm templates to enable parameterized deployments and environment-specific configurations.

Site Reliability Engineer (SRE)

Quotient Technology Inc.
Bangalore
09.2020 - 02.2022
  • Migrated virtual machines from vCenter 5.5 to 6.7, ensuring minimal downtime.
  • Created and managed VMs in vCenter, performed disk cleanup and archiving (zipping) operations on servers.
  • Configured Dell ESX servers via iDRAC, and installed services on Windows servers via RDP.
  • Reset user passwords via Active Directory, and restarted services using UrbanCode Deploy and Rundeck.
  • Migrated on-premise infrastructure to AWS, enhancing scalability and disaster recovery capabilities.
  • Utilized Terraform to scale infrastructure architecture using modular AWS components.
  • Applied least-privilege access and IAM controls to maintain strict security posture across cloud and on-premise resources, supporting compliance with SOX and PCI-DSS standards.
  • Managed asset uploads and purges on Microsoft Azure, supporting cloud storage efficiency.
  • Familiarity with NetApp storage solutions for enterprise data management.
  • Used OpManager Plus for identifying and selecting available IP addresses.
  • Conducted traffic shifts using Clippy to manage routing and minimize service disruptions.
  • Monitored alerts via PagerDuty, enabling timely incident response.
  • Managed and configured alert notifications using Nagios, including checking critical DNS records.

L1 System Administrator

Atos
Pune
11.2018 - 09.2020
  • Conducted regular process and disk space monitoring on both Linux and Windows servers.
  • Performed database monitoring to track performance, connectivity, and query load.
  • Monitored and created incidents using ServiceNow, ensuring timely escalation and resolution.
  • Utilized ServiceNow dashboards to monitor various server metrics and health statuses.
  • Created Pre-BE Alerts (PBAs) during scheduled server maintenance to inform stakeholders and prevent false alarms.
  • Scheduled and facilitated bridge calls for high-priority incidents, coordinating with cross-functional teams for rapid issue resolution.
  • Rescheduled failed jobs to ensure business continuity and SLA adherence.
  • Reprocessed and excluded IDOCs (Invoices) to correct data flow and resolve integration errors.

Education

Bachelor of Engineering - Information Science

AMC Engineering College
Bangalore, Karnataka
07.2018

Skills

  • Linux Administration
  • Microsoft Azure
  • AWS
  • Terraform
  • Grafana
  • Docker
  • Machine Learning
  • DevOps
  • HELM
  • Networking
  • Database
  • Kubernetes
  • VM Migration
  • Incident Management

Tools & Technologies

  • Cloud Platforms: AWS, Microsoft Azure, Walmart Cloud Native Platform (WCNP), OneOps
  • Infrastructure as Code & Automation: Terraform, Helm, YAML, Rundeck, UrbanCode Deploy, DX Console
  • Containerization & Orchestration: Kubernetes (basic), Helm, POD scaling
  • Monitoring & Logging: Grafana (MMS Query), Splunk, Dynatrace, OpenObserve, Nagios, ServiceNow, PagerDuty, XMatters
  • Virtualization & Server Management: vCenter (5.5 & 6.7), Dell ESX Servers (iDRAC), VM Migration, VM Creation
  • Networking & IP Management: OpManager Plus, Clippy, DNS management
  • Storage: NetApp Storage
  • Scripting & Job Management: Java Console (message queues), AD (Active Directory), Job rescheduling
  • Operating Systems: Linux, Windows Server

Training

  • Undergone CLAP-IT IMS L0 and L1 training from NETWORK LABS(INDIA) PVT.LTD
  • L0-training: Basics of Hardware, Networking, Server roles, Storage and Backup.
  • L1- ‘Wintel’ specialization (Windows and Linux administration)

Timeline

Site Reliability Operations / NOC Engineer

Walmart
02.2022 - Current

Site Reliability Engineer (SRE)

Quotient Technology Inc.
09.2020 - 02.2022

L1 System Administrator

Atos
11.2018 - 09.2020

Bachelor of Engineering - Information Science

AMC Engineering College
VIVEK KODAGALI