Summary
Overview
Work History
Education
Skills
Hobbies and Interests
Languages
Affiliations
Disclaimer
Timeline
Generic

Archit Malhotra

Faridabad

Summary

Accomplished Site Reliability Engineer with a proven track record at IDFC FIRST BANK, specializing in observability dashboards and incident response. Expert in Shell Scripting and system monitoring, I enhance application reliability while fostering team collaboration to drive operational excellence. Committed to delivering impactful solutions that elevate user experience.

Overview

9
9
years of professional experience

Work History

Site Reliability Engineer

IDFC FIRST BANK
Bangalore
12.2022 - Current
  • Developed observability dashboards for banking applications, calculating SLA, SLO, and SLI metrics according to industry standards.
  • Maintained application reliability through process automation, minimizing manual intervention.
  • Analyzed infrastructure and performance issues to identify root causes systematically.
  • Troubleshot incidents, implemented alerts, and provided solutions to prevent recurrence.
  • Mapped product flows into user journeys to enhance user experience significantly.
  • Created observability metrics that improved system monitoring capabilities.
  • Provided L2-L3 product support to effectively address user complaints and concerns.
  • Managed system monitoring and incident response for critical banking services.

Site Reliability Engineer II

Akamai Technologies
08.2021 - 12.2022
  • Working as Site Reliability Engineer II in Security Engineering Team, handling customer facing applications, responsible for deploying and managing various components
  • Working in Enterprise Threat Protector Team
  • Good understanding of Software Development Life Cycle (SDLC) processes and agile methodology.
  • Good hands-on knowledge of Source Code Management (Version Control System) tools like GIT.
  • Hands-on knowledge of software containerization platforms like Docker.
  • Experience with integration tools like Jenkins.
  • Hands-on knowledge configuration management tools like Ansible.
  • Hands-on experience creating and managing Kubernetes clusters in different types of environments.
  • Hands-on Knowledge of continuous monitoring tools like Grafana.
  • Good in automating OS-level tasks using shell scripting.

Platform Operations Engineer

Akamai Technologies
07.2019 - 08.2021
  • Dedicated Platform Operations Engineer who enjoys cultivating long term partnerships with vendors and clients. Expertise in installing, configuring, and monitoring complex systems and infrastructures.
  • Written SQL queries for extracting the required details and organizing them in presentable way.
  • Implementation application and system monitoring with Grafana.
  • Handling of (system, application) and network related alerts for deployed machines.
  • Worked on configuration management system (UMP).
  • Handling emails that are directed towards akamai from different ISP's and costumers.
  • Platform Operation Engineer in NOCC (Network Operations Command Center) at Akamai Technologies.
  • Having experience in handling complex networks of around 4 lakh servers (Linux & Windows).

System Engineer

Tata Consultancy Services
07.2016 - 07.2019
  • Rich experience with the Sailpoint IIQ Peoplesoft infrastructure, Multi Factor Authentication infrastructure.
  • Having experience in Data encryption infrastructure in MicroFocus Voltage and Thales Vormetric Application.
  • Worked on the ticketing system tool (Service Now).
  • Provided 24/7-production support for Multi-Factor Authentication users, SailPoint IIQ, Data Protection Applications(java-based) and other cloud hosted applications.
  • Hands on experience on the bug tracking, issue tracking and project management tool Jira.
  • Linux System Administration.
  • Knowledge about Packet Management (YUM, RPM), User Management, File System, storage components (LVM & RAID) and Disk Management.
  • Installing and configuring Apache Tomcat server on Linux VM's.
  • Unix Shell Scripting.
  • Worked on capacity planning of Infrastructure of various application.
  • Developed Shell Scripts to calculate (KPI) Key Performance Indicator for applications based on production support standards and policies.
  • Hands on experience on monitoring tools (Splunk, AppDynamics, Dynatrace, Kibana, Gomez, Wily, SiteScope, Nimsoft).
  • Knowledge of change management as worked in release management for a year.
  • Experienced in handling incident management bridges and coordinate with different teams.

Education

Bachelor of Engineering - electronics and communication

05.2016

Higher secondary certificate -

C.B.S.E Board
01.2012

Senior secondary certificate -

C.B.S.E Board
01.2010

Skills

  • Shell Scripting
  • Perl
  • Python
  • SQL
  • Seibel
  • Splunk
  • AppDynamics
  • Dynatrace
  • Kibana
  • Gomez
  • Wily
  • SiteScope
  • Nimsoft
  • Grafana
  • Opsgenie
  • Ansible
  • Git
  • Jenkins
  • Anaconda
  • Spyder
  • Sublime Text Editor
  • JIRA
  • PUTTY
  • Banking and Finance
  • Apache Tomcat
  • Windows XP
  • Windows 7
  • Windows 10
  • Linux
  • RedHat EcP
  • Centos 7
  • Akamai proprietary Linux
  • Observability dashboards
  • Incident response
  • System monitoring
  • Application support
  • Root cause analysis
  • Team collaboration
  • Linux administration

Hobbies and Interests

  • Football
  • Cricket
  • Cooking

Languages

  • English
  • Hindi
  • Punjabi

Affiliations

Monitoring and observability Incident Management & Automation Compliance, Risk & Governance SRE transformation and collaboration Recognition and impact

  • Developed and maintained Grafana dashboards (DIY Bible, Error Dashboards, Funding Statistics) to improve visibility across business flows
  • Integrated Prometheus metrics for enhanced system monitoring and real-time alerting
  • Implemented API performance monitoring and initiated SLI, SLO, SLA tracking for NTB SA journeys
  • Recovered lost dashboards and set up code-based observability backup strategies
  • Reduced mean time to recovery (MTTR) by streamlining incident response through Grafana-triggered alerts and knowledge base (KB) articles
  • Proposed and initiated scheduler-based solutions for recurring 5xx errors in Aadhaar-Karza integrations
  • Enabled unified incident reporting across BTO and Snow teams, standardizing release and post-mortem workflows.
  • Investigated and mitigated issues such as AOF data loss, OTP toggle bypass, and CIBIL API ingress misconfigurations
  • Ensured 100% adherence to compliance protocols, mandatory training, and timesheet submissions
  • Spearheaded SRE transformation for NTB SA application, transitioning from L2 support to proactive reliability engineering
  • Actively collaborated with DevOps, observability, and business teams to enhance uptime, reduce alert fatigue, and automate escalations
  • Participated in code reviews and architecture discussions to deepen understanding of Java and Kubernetes internals.
  • Rated “Excellent” across all 2024 performance goals.
  • Recognized by management for dashboard innovation, independent problem-solving, and team enablement.

Disclaimer

I hereby declare that the information furnished above is true to the best of my knowledge.

Timeline

Site Reliability Engineer

IDFC FIRST BANK
12.2022 - Current

Site Reliability Engineer II

Akamai Technologies
08.2021 - 12.2022

Platform Operations Engineer

Akamai Technologies
07.2019 - 08.2021

System Engineer

Tata Consultancy Services
07.2016 - 07.2019

Bachelor of Engineering - electronics and communication

Higher secondary certificate -

C.B.S.E Board

Senior secondary certificate -

C.B.S.E Board
Archit Malhotra