Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic
Shivika Nagpal

Shivika Nagpal

Chandigarh

Summary

Senior Site Reliability Engineer with 6+ years of experience supporting and scaling high-availability, mission-critical systems in FinTech and Enterprise environments. Strong background in incident management, automation, CI/CD, monitoring, and cloud-native platforms. Proven ability to improve system reliability through SLI/SLO-driven observability, automation, and blameless postmortems, while operating under peak traffic and regulatory constraints. Passionate about building resilient systems that balance reliability, velocity, and operational excellence.

Overview

7
7
years of professional experience

Work History

Senior Site Reliability Engineer

Visa Inc.
Bengaluru
02.2022 - Current

•⁠ ⁠Owned reliability and availability of mission-critical, high-throughput payment systems, consistently maintaining 99.999%+ availability in a regulated fintech environment.
•⁠ ⁠Defined, monitored, and continuously improved SLIs, SLOs, and SLAs using Splunk and Grafana, enabling proactive detection of anomalies and faster incident response.
•⁠ ⁠Led incident management for Sev-1 and Sev-2 production incidents, performing blameless postmortems and deep root cause analysis (RCA) to prevent recurrence and improve system resilience.
•⁠ ⁠Leveraged GenAI to summarize incidents, logs and RCA findings, accelerating post-incident reviews and stakeholder communication.
•⁠ ⁠Designed and maintained CI/CD pipelines using Jenkins, enabling automated deployments, security remediations, and configuration changes, reducing manual intervention and human error by ~30%.
•⁠ ⁠Implemented zero-touch patching for approximately 80% of application servers by automating patch validation, dry runs, and rollout strategies, significantly improving security posture and operational efficiency.
•⁠ ⁠Executed and supervised high-risk production changes during peak transaction periods, ensuring system stability under extreme load and maintaining error budgets.
•⁠ ⁠Collaborated closely with development, security, and platform teams to implement scalable and fault-tolerant architectures aligned with reliability best practices.
•⁠ ⁠Supported capacity planning and performance monitoring by tracking transactions-per-second (TPS) metrics and system behavior under peak traffic conditions.
•⁠ ⁠Worked extensively with Kubernetes and Linux-based environments to improve application availability, scalability, and deployment reliability.
•⁠ ⁠Leveraged AWS cloud infrastructure to support production workloads, ensuring secure and reliable operations across environments.
•⁠ ⁠Integrated operational data into Power BI dashboards for monthly and quarterly reliability, availability, and performance reporting for internal stakeholders and clients.
•⁠ ⁠Utilized ServiceNow for incident, change, and problem management, ensuring compliance with organizational change management and audit requirements.

  • Applied GenAI to analyze historical incident data and patterns, identifying recurring failure modes, and opportunities for automation.

•⁠ ⁠Strengthened system security through secure configurations, controlled access, and collaboration with security teams on remediation efforts.

System Engineer

Tata Consultancy Services
Indore
11.2018 - 02.2022

•⁠ ⁠Supported reliability and operational stability of multiple enterprise applications, transitioning from traditional application support to platform and reliability engineering responsibilities.
•⁠ ⁠Administered and migrated large-scale SharePoint environments, automating data migration workflows using Python scripts and ShareGate, reducing manual effort and migration risk.
•⁠ ⁠Built and managed 70+ SharePoint application sites, ensuring high availability, access control, and operational consistency.
•⁠ ⁠Deployed and configured Red Hat Linux servers on Microsoft Azure, including OS hardening, package management, and environment setup for production workloads.
•⁠ ⁠Installed, upgraded, and supported Java-based applications on Linux servers, including dependency management, SSL certificate configuration, and secure zone deployments.
•⁠ ⁠Implemented secure access controls using CyberArk, improving credential management and reducing security exposure.
•⁠ ⁠Acted as primary point of contact (PoC) for multiple production applications, handling incidents, troubleshooting, and coordination with development teams.
•⁠ ⁠Supported incident resolution and change implementations, ensuring minimal downtime and adherence to operational best practices.
•⁠ ⁠Collaborated with clients during requirements analysis, deployment, and production support, gaining early exposure to reliability, scalability, and system design considerations.
•⁠ ⁠Assisted in onboarding new projects and setting up operations at new locations, contributing to process standardization and operational readiness.

Education

Bachelor of Technology - Information Technology

Banasthali Vidyapith
07.2018

Class XII -

The Scholars Home School
03.2014

Class X -

The Scholars Home School
03.2012

Skills

  • Kubernetes
  • AWS
  • Grafana
  • Powershell
  • Python (Programming Language)
  • CI/CD
  • Monitoring and Alerting
  • Linux
  • Incident Management
  • RCA
  • Automation
  • Jenkins

Websites

Timeline

Senior Site Reliability Engineer

Visa Inc.
02.2022 - Current

System Engineer

Tata Consultancy Services
11.2018 - 02.2022

Bachelor of Technology - Information Technology

Banasthali Vidyapith

Class XII -

The Scholars Home School

Class X -

The Scholars Home School
Shivika Nagpal