Summary
Overview
Work History
Education
Skills
Certification
Timeline
SoftwareEngineer

Nilesh Mishra

Site Reliability Engineer
Surat

Summary

Seeking career-enriching opportunities in: Site Reliability Engineer (SRE) | DevOps | Full Stack Observability | IaC |Docker|Terraform|Ansible|Linux|K8s|Python|CI-CD|AWS.

Qualified M. Tech (Electronics) from Vellore Institute of Technology (Vellore.TN) with 1st class Division and 10+ Years of prolific learning experience in IT spanning troubleshooting the mean time and increasing productivity.

Experienced Site Engineer with 11 years of experience managing multiple simultaneous responsibilities to foster construction project completion. Well-organized planner and problem-solver versed in site preparation and day-to-day management.

Progressive Site Reliability Engineer dedicated to discovering and developing novel ways to protect worker safety and operational continuity. Creator of techniques and tools mitigating risk and expediting project completion. Prepared to revolutionize legacy procedures by developing customized, innovative approaches.

Overview

2025
2025
years of professional experience
2025
2025
years of post-secondary education
3
3
Certifications
3
3
Languages

Work History

Site Reliability Engineer Lead

AU Small Finance Bank
1 2023 - Current
  • As an SRE Lead, I established SRE processes and best practices, proactively maintained SLOs and SLAs, and enhanced system reliability through automation, monitoring, alerting, and observability
  • I streamlined infrastructure management to ensure scalability and stability, reduced manual tasks to improve efficiency amidst frequent updates, and encouraged the internal team to adopt and adhere to SRE processes
  • Responsible for maintaining and scaling critical infrastructure on AWS using Kubernetes for the IBMB (Internet Banking & Mobile Banking) & Video Banking application, supporting millions of users
  • Designing and implementing resilient EKS architectures, leveraging AWS services to ensure business continuity, application availability, and disaster recovery
  • Experience deploying and maintaining applications in Kubernetes [EKS] environments within AWS using CI/CD pipelines, incorporating technologies like Docker, terraform (IaC), Helm, and integrated with GitLab
  • Completed a project for the Corporate Internet Banking (CIB) application using a CI/CD pipeline with Git, Gitlab and Ansible Tower, effectively eliminating hogging threads
  • Established an internal SRE team to defining processes and best practices for SRE work
  • Implemented and managed a centralized full-stack observability and monitoring solution using the ELK stack (Elasticsearch, Logstash, Kibana) Prometheus & Grafana
  • Proficient in implementing Synthetic Monitoring using the Elastic Stack to ensure application performance, reliability, and uptime
  • Developed and automated test scripts using Playwright for comprehensive browser-based testing, covering web navigation, interaction, and user journey validation
  • FinOps Optimization saves over $55000 USD yearly
  • Designed and implemented monitoring and alerting solutions based on SRE's four golden signals, enhancing SLO and SLA monitoring to reduce MTTD and MTTR
  • Establish and enforce SRE best practices, including monitoring, alerting, error budget tracking, and post-incident reviews
  • AWS services SRE using EKS, ECR, VPC, EC2, S3, DevOps Guru, Event Bridge, TGW, IGW, NAT, RDS, IAM
  • Created Python and Bash scripts to automate tasks, reduce repetitive work, and minimize toil

Sr. Site Reliability – DevOps Engineer

Trootech Business Solution Pvt Ltd
11.2022 - 12.2022
  • AWS, Linux, Docker, Kubernetes, CI-CD, Terraform & Bash & Python Scripting
  • Collaborate with software engineering teams to design and implement reliable, scalable, and efficient systems
  • Continuously evaluate and improve our infrastructure, processes, and practices to ensure reliability and scalability
  • Develop and implement strategies for improving system reliability, scalability, and performance

Sr. Site Reliability– DevOps Engineer

Mobile Tornado - DevX
01.2020 - 10.2022
  • Company Overview: Mobile Tornado, its UK-Israel product-based to leading provide the solution of Tier1 company
  • Handled clients from the UK, Bogota, Israel, Canada, and South Africa to understand requirements and delivered project on time
  • Manage, Train and Support team members for multiple projects on Docker, Kubernetes, Python, Linux, Git, Terraform, AWS, Product Support, etc
  • Monitoring and Observability using Prometheus, Grafana, ELK, Metric servers
  • I’m also creating an RCA document if the customer must want during failure happened in our services
  • Mobile Tornado, its UK-Israel product-based to leading provide the solution of Tier1 company

Site Reliability Engineer

Flipkart Internet Pvt Ltd
10.2015 - 01.2020
  • Reliability engineering uses SLAs/SLOs and error budgets to ensure application robustness
  • Infrastructure & Systems Management: Built and maintained hardware, OS, container, and virtualization environments for high availability and performance
  • Developing and implementing Python and shell scripts to automate routine tasks, enhancing efficiency, and reducing manual effort
  • Architecture & Standards: Contributed to evolving design and architectural standards, aligning with best practices for resilient operation
  • Tier 1 and Tier 2 support
  • Responding to and resolving user issues and system problems, escalating complex issues as needed
  • My responsibilities included monitoring and alerting using Prometheus, Grafana, and the ELK stack
  • I ensured system uptime through proactive monitoring, diagnosing bottlenecks, and implementing short- and long-term solutions
  • Containerization and orchestration, using Docker and Kubernetes, enable the migration of internal on-premises applications to the cloud

MNE (Physics Expert)

Chegg
08.2015 - 10.2015
  • Company Overview: I was worked with Chegg Inc
  • (India)
  • Helping students pursuing their under-graduation (i.e
  • Bachelors) and Masters (MS) in USA, by authoring/providing solutions to their Assignments and other Queries/Questions (online), related with Physics & Advanced Physics & Engineering
  • I was worked with Chegg Inc
  • (India)

System Engineer

Add-Sun Technology
12.2013 - 07.2015
  • Responsible for maintaining the cloud base services like VPC, S3, IAM, Route 53, billing on Maintaining the snapshots of EC2
  • Responsible for designing, implementing, and troubleshooting infrastructures for new technology
  • Educated users on business interruption risk and received their buy in for system standardization and update

Research and Publication of (Artificial Intelligence) of Innovative Memory (Dynamic Memristor)

09.2012 - 11.2013
  • This Project collaboration between VIT(Vellore) and IIT Bombay and new types of AI (Artificial Intelligence) that improve the efficiency of teaching machines to think like humans

Education

M. Tech - Electronics

Vellore Institute of Technology
Vellore, TN

Artificial Intelligence

VIT
09.2012 - 11.2013

Skills

Site Reliability Engineering: Incident management, SLOs, SLAs, and MTTR reduction

Certification

AWS Certified Solutions Architect-Associate, B9RGJN7BCMBQ1ZWF

Timeline

Sr. Site Reliability – DevOps Engineer

Trootech Business Solution Pvt Ltd
11.2022 - 12.2022

Sr. Site Reliability– DevOps Engineer

Mobile Tornado - DevX
01.2020 - 10.2022

Site Reliability Engineer

Flipkart Internet Pvt Ltd
10.2015 - 01.2020

MNE (Physics Expert)

Chegg
08.2015 - 10.2015

System Engineer

Add-Sun Technology
12.2013 - 07.2015

Research and Publication of (Artificial Intelligence) of Innovative Memory (Dynamic Memristor)

09.2012 - 11.2013

Artificial Intelligence

VIT
09.2012 - 11.2013

Site Reliability Engineer Lead

AU Small Finance Bank
1 2023 - Current

M. Tech - Electronics

Vellore Institute of Technology
Nilesh MishraSite Reliability Engineer