Summary

Overview

Work History

Education

Skills

Websites

Timeline

Shivika Nagpal

Chandigarh

Summary

Senior Site Reliability Engineer with 6+ years of experience supporting and scaling high-availability, mission-critical systems in FinTech and Enterprise environments. Strong background in incident management, automation, CI/CD, monitoring, and cloud-native platforms. Proven ability to improve system reliability through SLI/SLO-driven observability, automation, and blameless postmortems, while operating under peak traffic and regulatory constraints. Passionate about building resilient systems that balance reliability, velocity, and operational excellence.

Overview

years of professional experience

Work History

Senior Site Reliability Engineer

Visa Inc.

Bengaluru

02.2022 - Current

•⁠ ⁠Owned reliability and availability of mission-critical, high-throughput payment systems, consistently maintaining 99.999%+ availability in a regulated fintech environment.
•⁠ ⁠Defined, monitored, and continuously improved SLIs, SLOs, and SLAs using Splunk and Grafana, enabling proactive detection of anomalies and faster incident response.
•⁠ ⁠Led incident management for Sev-1 and Sev-2 production incidents, performing blameless postmortems and deep root cause analysis (RCA) to prevent recurrence and improve system resilience.
•⁠ ⁠Leveraged GenAI to summarize incidents, logs and RCA findings, accelerating post-incident reviews and stakeholder communication.
•⁠ ⁠Designed and maintained CI/CD pipelines using Jenkins, enabling automated deployments, security remediations, and configuration changes, reducing manual intervention and human error by ~30%.
•⁠ ⁠Implemented zero-touch patching for approximately 80% of application servers by automating patch validation, dry runs, and rollout strategies, significantly improving security posture and operational efficiency.
•⁠ ⁠Executed and supervised high-risk production changes during peak transaction periods, ensuring system stability under extreme load and maintaining error budgets.
•⁠ ⁠Collaborated closely with development, security, and platform teams to implement scalable and fault-tolerant architectures aligned with reliability best practices.
•⁠ ⁠Supported capacity planning and performance monitoring by tracking transactions-per-second (TPS) metrics and system behavior under peak traffic conditions.
•⁠ ⁠Worked extensively with Kubernetes and Linux-based environments to improve application availability, scalability, and deployment reliability.
•⁠ ⁠Leveraged AWS cloud infrastructure to support production workloads, ensuring secure and reliable operations across environments.
•⁠ ⁠Integrated operational data into Power BI dashboards for monthly and quarterly reliability, availability, and performance reporting for internal stakeholders and clients.
•⁠ ⁠Utilized ServiceNow for incident, change, and problem management, ensuring compliance with organizational change management and audit requirements.

Applied GenAI to analyze historical incident data and patterns, identifying recurring failure modes, and opportunities for automation.

•⁠ ⁠Strengthened system security through secure configurations, controlled access, and collaboration with security teams on remediation efforts.

System Engineer

Tata Consultancy Services

Indore

11.2018 - 02.2022

•⁠ ⁠Supported reliability and operational stability of multiple enterprise applications, transitioning from traditional application support to platform and reliability engineering responsibilities.
•⁠ ⁠Administered and migrated large-scale SharePoint environments, automating data migration workflows using Python scripts and ShareGate, reducing manual effort and migration risk.
•⁠ ⁠Built and managed 70+ SharePoint application sites, ensuring high availability, access control, and operational consistency.
•⁠ ⁠Deployed and configured Red Hat Linux servers on Microsoft Azure, including OS hardening, package management, and environment setup for production workloads.
•⁠ ⁠Installed, upgraded, and supported Java-based applications on Linux servers, including dependency management, SSL certificate configuration, and secure zone deployments.
•⁠ ⁠Implemented secure access controls using CyberArk, improving credential management and reducing security exposure.
•⁠ ⁠Acted as primary point of contact (PoC) for multiple production applications, handling incidents, troubleshooting, and coordination with development teams.
•⁠ ⁠Supported incident resolution and change implementations, ensuring minimal downtime and adherence to operational best practices.
•⁠ ⁠Collaborated with clients during requirements analysis, deployment, and production support, gaining early exposure to reliability, scalability, and system design considerations.
•⁠ ⁠Assisted in onboarding new projects and setting up operations at new locations, contributing to process standardization and operational readiness.

Education

Bachelor of Technology - Information Technology

Banasthali Vidyapith

07.2018

Class XII -

The Scholars Home School

03.2014

Class X -

The Scholars Home School

03.2012

Skills

Kubernetes
AWS
Grafana
Powershell
Python (Programming Language)
CI/CD

Monitoring and Alerting
Linux
Incident Management
RCA
Automation
Jenkins

Websites

Timeline

Senior Site Reliability Engineer

Visa Inc.

02.2022 - Current

System Engineer

Tata Consultancy Services

11.2018 - 02.2022

Bachelor of Technology - Information Technology

Banasthali Vidyapith

Class XII -

The Scholars Home School

Class X -

The Scholars Home School

Shivika Nagpal

Summary

Overview

Work History

Senior Site Reliability Engineer

System Engineer

Education

Bachelor of Technology - Information Technology

Class XII -

Class X -

Skills

Websites

Timeline

Senior Site Reliability Engineer

System Engineer

Bachelor of Technology - Information Technology

Class XII -

Class X -

Similar Profiles

MUHAMMAD ASIF JANROSHANMUHAMMAD ASIF JANROSHAN

Joel WoodburyJoel Woodbury

Chandra Shekar Raju Chandra Shekar Raju null

Priyanka GurjerPriyanka Gurjer