Summary
Overview
Work History
Education
Skills
Certification
Leadership Experience
Timeline
Generic

Prachi Bhati

Bangalore

Summary

AWS Certified Site Reliability Engineer with 3.6+ years of experience managing cloud infrastructure, Kubernetes, F5, and CI/CD automation. Proficient in AWS services (EC2, S3, Lambda, RDS, EKS), infrastructure as code (Terraform), monitoring tools (Splunk, Zabbix), and scripting in Python and Shell. Adept at driving reliability, scalability, and automation across complex production environments.

Overview

3
3
years of professional experience

Work History

Site Reliability Engineer

Oracle Cerner
08.2022 - Current
  • Provisioned and configured more than 60 nodes (internal and external) in a ClosedStack environment for Traefik Hardening in EMEA (Konvoy cluster), customizing templates and resolving Chef cookbook version issues.
  • Developed a centralized Uptime Dashboard in Zabbix to monitor CERNPRP nodes in production and non-production, ensuring real-time visibility and reducing manual checks by 80%.
  • Automated patching and reboot process for all UK servers using centralized management scripts, enhancing compliance and reducing manual effort.
  • Developed a suite of reusable Bash scripts for SFTP task management, reducing maintenance effort by 75% and enabling faster deployment of critical system updates, enabling 50 weekly scheduled tasks.
  • Resolved EC2 reboot failures by repairing faulty EBS volumes and restoring instance integrity, reducing downtime by 60%.
  • Managed AWS infrastructure: EC2, Route 53, EBS, S3, Lambda, RDS, EMR, Data Pipeline.
  • Executed infrastructure automation using Terraform, enhancing deployment consistency and scalability across environments.
  • Supported DCOS to Kubernetes migration activities and coordinated with stakeholders for smooth production flips.
  • Built and managed F5 BIG-IP configurations: SSL profiles, certificates, virtual servers (VS), and load balancers.
  • Designed Zabbix dashboards to monitor infrastructure KPIs, improving system reliability and alerting visibility.
  • Utilized New Relic and Splunk to analyze performance metrics and troubleshoot application and infra issues.
  • Handled CI/CD pipelines using Jenkins, GitHub Actions, and AWS CodePipeline for automation of builds, testing, and deployments.
  • Experienced in incident management across production environments, including real-time issue identification, root cause analysis (RCA), and coordinated resolution with cross-functional teams. Actively contributed to on-call rotations, led post-incident reviews, and implemented corrective actions to minimize recurrence and improve system reliability.
  • Collaborated across teams to troubleshoot complex issues, perform service account password resets, and support vertical-specific closed stack systems.
  • Implemented automation scripts to enhance deployment processes and reduce downtime.
  • Monitored system performance, identifying issues to maintain service reliability and improve user experience.

System Intern

Cerner Corporation
02.2022 - 08.2022
  • Contributed to Hadoop ecosystem operations: Kafka, HDFS, YARN, and HBase for data pipeline reliability.
  • Supported staff members in their daily tasks, reducing workload burden and allowing for increased focus on higher-priority assignments.

Education

Master of Computer Application -

Vellore Institute of Technology
Chennai India
04.2022

Skills

    Cloud & DevOps: AWS (EC2, S3, Lambda, RDS, EKS), Oracle Cloud, Terraform, Jenkins, GitHub Actions, CodePipeline
    Infrastructure & Monitoring: Kubernetes, F5 BIG-IP, Zabbix, Splunk, New Relic, Grafana
    Scripting & Tools: Python, Linux, Bash, Chef, Ansible
    Others: Jira, VMware, Kafka, Hadoop (HDFS, YARN, HBase), CI/CD, DNS, Load Balancers

Certification

AWS Solution Architecture Associate 

Leadership Experience

Demonstrated strong leadership by driving cross-functional collaboration between SRE, security, and application teams to implement infrastructure automation initiatives. These efforts significantly improved deployment reliability and reduced manual errors by over 40%. Additionally, mentored and supported the growth of 3+ junior engineers by guiding them in areas such as cloud infrastructure, monitoring tools (Zabbix, New Relic), and incident response processes-accelerating their onboarding and enabling faster team-wide productivity.

Timeline

Site Reliability Engineer

Oracle Cerner
08.2022 - Current

System Intern

Cerner Corporation
02.2022 - 08.2022

Master of Computer Application -

Vellore Institute of Technology
Prachi Bhati