Accomplished Site Reliability Engineer with a proven track record at IDFC FIRST BANK, specializing in observability dashboards and incident response. Expert in Shell Scripting and system monitoring, I enhance application reliability while fostering team collaboration to drive operational excellence. Committed to delivering impactful solutions that elevate user experience.
Overview
9
9
years of professional experience
Work History
Site Reliability Engineer
IDFC FIRST BANK
Bangalore
12.2022 - Current
Developed observability dashboards for banking applications, calculating SLA, SLO, and SLI metrics according to industry standards.
Maintained application reliability through process automation, minimizing manual intervention.
Analyzed infrastructure and performance issues to identify root causes systematically.
Troubleshot incidents, implemented alerts, and provided solutions to prevent recurrence.
Mapped product flows into user journeys to enhance user experience significantly.
Created observability metrics that improved system monitoring capabilities.
Provided L2-L3 product support to effectively address user complaints and concerns.
Managed system monitoring and incident response for critical banking services.
Site Reliability Engineer II
Akamai Technologies
08.2021 - 12.2022
Working as Site Reliability Engineer II in Security Engineering Team, handling customer facing applications, responsible for deploying and managing various components
Working in Enterprise Threat Protector Team
Good understanding of Software Development Life Cycle (SDLC) processes and agile methodology.
Good hands-on knowledge of Source Code Management (Version Control System) tools like GIT.
Hands-on knowledge of software containerization platforms like Docker.
Experience with integration tools like Jenkins.
Hands-on knowledge configuration management tools like Ansible.
Hands-on experience creating and managing Kubernetes clusters in different types of environments.
Hands-on Knowledge of continuous monitoring tools like Grafana.
Good in automating OS-level tasks using shell scripting.
Platform Operations Engineer
Akamai Technologies
07.2019 - 08.2021
Dedicated Platform Operations Engineer who enjoys cultivating long term partnerships with vendors and clients. Expertise in installing, configuring, and monitoring complex systems and infrastructures.
Written SQL queries for extracting the required details and organizing them in presentable way.
Implementation application and system monitoring with Grafana.
Handling of (system, application) and network related alerts for deployed machines.
Worked on configuration management system (UMP).
Handling emails that are directed towards akamai from different ISP's and costumers.
Platform Operation Engineer in NOCC (Network Operations Command Center) at Akamai Technologies.
Having experience in handling complex networks of around 4 lakh servers (Linux & Windows).
System Engineer
Tata Consultancy Services
07.2016 - 07.2019
Rich experience with the Sailpoint IIQ Peoplesoft infrastructure, Multi Factor Authentication infrastructure.
Having experience in Data encryption infrastructure in MicroFocus Voltage and Thales Vormetric Application.
Worked on the ticketing system tool (Service Now).
Provided 24/7-production support for Multi-Factor Authentication users, SailPoint IIQ, Data Protection Applications(java-based) and other cloud hosted applications.
Hands on experience on the bug tracking, issue tracking and project management tool Jira.
Linux System Administration.
Knowledge about Packet Management (YUM, RPM), User Management, File System, storage components (LVM & RAID) and Disk Management.
Installing and configuring Apache Tomcat server on Linux VM's.
Unix Shell Scripting.
Worked on capacity planning of Infrastructure of various application.
Developed Shell Scripts to calculate (KPI) Key Performance Indicator for applications based on production support standards and policies.
Hands on experience on monitoring tools (Splunk, AppDynamics, Dynatrace, Kibana, Gomez, Wily, SiteScope, Nimsoft).
Knowledge of change management as worked in release management for a year.
Experienced in handling incident management bridges and coordinate with different teams.
Education
Bachelor of Engineering - electronics and communication
05.2016
Higher secondary certificate -
C.B.S.E Board
01.2012
Senior secondary certificate -
C.B.S.E Board
01.2010
Skills
Shell Scripting
Perl
Python
SQL
Seibel
Splunk
AppDynamics
Dynatrace
Kibana
Gomez
Wily
SiteScope
Nimsoft
Grafana
Opsgenie
Ansible
Git
Jenkins
Anaconda
Spyder
Sublime Text Editor
JIRA
PUTTY
Banking and Finance
Apache Tomcat
Windows XP
Windows 7
Windows 10
Linux
RedHat EcP
Centos 7
Akamai proprietary Linux
Observability dashboards
Incident response
System monitoring
Application support
Root cause analysis
Team collaboration
Linux administration
Hobbies and Interests
Football
Cricket
Cooking
Languages
English
Hindi
Punjabi
Affiliations
Monitoring and observability Incident Management & Automation Compliance, Risk & Governance SRE transformation and collaboration Recognition and impact
Developed and maintained Grafana dashboards (DIY Bible, Error Dashboards, Funding Statistics) to improve visibility across business flows
Integrated Prometheus metrics for enhanced system monitoring and real-time alerting
Implemented API performance monitoring and initiated SLI, SLO, SLA tracking for NTB SA journeys
Recovered lost dashboards and set up code-based observability backup strategies
Reduced mean time to recovery (MTTR) by streamlining incident response through Grafana-triggered alerts and knowledge base (KB) articles
Proposed and initiated scheduler-based solutions for recurring 5xx errors in Aadhaar-Karza integrations
Enabled unified incident reporting across BTO and Snow teams, standardizing release and post-mortem workflows.
Investigated and mitigated issues such as AOF data loss, OTP toggle bypass, and CIBIL API ingress misconfigurations
Ensured 100% adherence to compliance protocols, mandatory training, and timesheet submissions
Spearheaded SRE transformation for NTB SA application, transitioning from L2 support to proactive reliability engineering
Actively collaborated with DevOps, observability, and business teams to enhance uptime, reduce alert fatigue, and automate escalations
Participated in code reviews and architecture discussions to deepen understanding of Java and Kubernetes internals.
Rated “Excellent” across all 2024 performance goals.
Recognized by management for dashboard innovation, independent problem-solving, and team enablement.
Disclaimer
I hereby declare that the information furnished above is true to the best of my knowledge.
Timeline
Site Reliability Engineer
IDFC FIRST BANK
12.2022 - Current
Site Reliability Engineer II
Akamai Technologies
08.2021 - 12.2022
Platform Operations Engineer
Akamai Technologies
07.2019 - 08.2021
System Engineer
Tata Consultancy Services
07.2016 - 07.2019
Bachelor of Engineering - electronics and communication
Manager-Finance at IDFC First Bank Ltd. (erstwhile Capital First Limited, merged with IDFC Bank w.e.f. Dec-18)Manager-Finance at IDFC First Bank Ltd. (erstwhile Capital First Limited, merged with IDFC Bank w.e.f. Dec-18)