Summary
Overview
Work History
Education
Skills
Hobbies and Interests
Timeline
Generic

Aman Kumar

Bengaluru

Summary

Senior Site Reliability Engineer with 7 years of experience in managing software applications and diagnosing complex technical issues. Expertise in Java, SQL, AWS, and microservices, with a strong focus on cybersecurity, e-commerce, banking, and insurance sectors. Proven track record of delivering effective solutions and enhancing application performance through observability and monitoring.

Overview

7
7
years of professional experience

Work History

Senior Site Reliability Engineer

Lookout
Bengaluru
03.2024 - Current
  • Supported four major product services: SASE (Cloud Security), MES (Endpoint Security), ZTNA (Zero Trust), and CASB.
  • Designed, developed, and maintained SRE platforms and tools.
  • Collaborated with teams to onboard onto SRE-owned platforms and implement best practices.
  • Created automation solutions and enhanced monitoring using DataDog and Opsgenie/PagerDuty.
  • Participated in Incident Response Team, managing alerts, incidents, and resolutions while sharing RCA.
  • Analyzed application log files to facilitate troubleshooting efforts.
  • Conducted root cause analysis for application issues, developing preventive solutions.

Site Reliability Engineer III

Unison Consulting
Singapore
07.2023 - 03.2024
  • Automated report visualization and dashboard creation through script development.
  • Established observability dashboards and alerts using VMware Tanzu and third-party integrations.
  • Oversaw end-to-end Level 2 and Level 3 support, ensuring effective Production/Operations Management.
  • Leveraged monitoring tools like ELK, Grafana, and Splunk to enhance operational insights.
  • Developed proficiency in Oracle databases, crafting scripts for efficient data management.
  • Participated in on-call rotation to maintain continuous system availability.

Site Reliability Engineer II

ACL Digital
Bengaluru
03.2023 - 07.2023
  • Monitored application alerts and performed first-line triage for effective issue resolution.
  • Executed manual tasks and facilitated alert suppression for optimal application performance.
  • Conducted application health checks and generated comprehensive reporting on system status.
  • Diagnosed code issues by analyzing web service and API responses, events, and logs.
  • Documented customer support processes, achieving a 35% reduction in support errors.
  • Developed PLSQL and SQL queries for data analysis and report generation for clients.

Site Reliability Engineer I

Groupon I
Bengaluru
10.2021 - 03.2023
  • Part of Team that owns Shopping cart service, Deal Catalog, User Review and Rating, Taxonomy, Wishlist Services
  • Improved service stability by identifying critical service-specific metrics, setting meaningful alerting thresholds, and automating alerting responses.
  • Well-versed in the E-commerce domain: logic implementation, Java, SQL, and REST APIs, microservices, DevOps, CI/CD pipelines, ELK, analyzing logs, debugging, and troubleshooting.
  • Perform periodic on-call duty support at Levels L1, L2, and L3, on a rotational basis.
  • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide 'white-glove' guidance on the path to production.
  • Monitored system performance and identified potential issues using VMware Tanzu, resulting in improved reliability and uptime.

Production Support Engineer

Tata Consultancy Services
Bengaluru
09.2018 - 10.2021
  • Supported the AWS-based banking application by delivering L1, L2, and L3 support, achieving reduced response and resolution times through ITIL practices.
  • Resolved escalated tickets daily, adhering to service level agreements.
  • Instituted automation techniques, decreasing manual labor by 30% and enhancing workflow efficiency by 50%.
  • Carried out data migration of new customers to our production database, ensuring data integrity and efficiency.
  • Leveraged technologies such as PL/SQL, Unix, Java, Jasper reports, and Shell Scripting for operational tasks.

Education

Bachelor of Technology - Computer Science and Engineering

Oriental Institute of Science And Technology
Bhopal, India
05-2018

Skills

  • Java and SQL
  • Restful APIs and microservices
  • Cloud services (AWS)
  • Container orchestration (Kubernetes)
  • Shell scripting and automation
  • Monitoring tools (Grafana, Splunk, DataDog)
  • Version control (Git)
  • Event streaming (Kafka)
  • Continuous integration and delivery (CI/CD)
  • Containerization (Docker)
  • Incident management and response
  • ServiceNow and application monitoring
  • Reporting tools (Jasper Report)
  • Build automation (Jenkins)
  • Log management (ELK stack)
  • Alerting systems (PagerDuty, Opsgenie)
  • Job scheduling (Control M, AutoSys)
  • API testing (Postman)
  • Web servers (Apache Tomcat, Nginx)
  • Configuration management (Ansible, Chef)
  • Infrastructure as code (Terraform)
  • Performance monitoring (Prometheus)

Hobbies and Interests

  • Problem solving
  • English Premier League
  • Stock market
  • Macro economics

Timeline

Senior Site Reliability Engineer

Lookout
03.2024 - Current

Site Reliability Engineer III

Unison Consulting
07.2023 - 03.2024

Site Reliability Engineer II

ACL Digital
03.2023 - 07.2023

Site Reliability Engineer I

Groupon I
10.2021 - 03.2023

Production Support Engineer

Tata Consultancy Services
09.2018 - 10.2021

Bachelor of Technology - Computer Science and Engineering

Oriental Institute of Science And Technology
Aman Kumar