Prachi Bhati - Site Reliability Engineer - Oracle Cerner

Summary

AWS Certified Site Reliability Engineer with 3.6+ years of experience managing cloud infrastructure, Kubernetes, F5, and CI/CD automation. Proficient in AWS services (EC2, S3, Lambda, RDS, EKS), infrastructure as code (Terraform), monitoring tools (Splunk, Zabbix), and scripting in Python and Shell. Adept at driving reliability, scalability, and automation across complex production environments.

Overview

3

years of professional experience

Work History

Site Reliability Engineer

Oracle Cerner

Bangalore

08.2022 - Current

Provisioned and configured more than 60 nodes (internal and external) in a ClosedStack environment for Traefik Hardening in EMEA (Konvoy cluster), customizing templates and resolving Chef cookbook version issues.
Developed a centralized Uptime Dashboard in Zabbix to monitor CERNPRP nodes in production and non-production, ensuring real-time visibility and reducing manual checks by 80%.
Automated patching and reboot process for all UK servers using centralized management scripts, enhancing compliance and reducing manual effort.
Developed a suite of reusable Bash scripts for SFTP task management, reducing maintenance effort by 75% and enabling faster deployment of critical system updates, enabling 50 weekly scheduled tasks.
Resolved EC2 reboot failures by repairing faulty EBS volumes and restoring instance integrity, reducing downtime by 60%.
Managed AWS infrastructure: EC2, Route 53, EBS, S3, Lambda, RDS, EMR, Data Pipeline.
Executed infrastructure automation using Terraform, enhancing deployment consistency and scalability across environments.
Supported DCOS to Kubernetes migration activities and coordinated with stakeholders for smooth production flips.
Built and managed F5 BIG-IP configurations: SSL profiles, certificates, virtual servers (VS), and load balancers.
Designed Zabbix dashboards to monitor infrastructure KPIs, improving system reliability and alerting visibility.
Utilized New Relic and Splunk to analyze performance metrics and troubleshoot application and infra issues.
Handled CI/CD pipelines using Jenkins, GitHub Actions, and AWS CodePipeline for automation of builds, testing, and deployments.
Experienced in incident management across production environments, including real-time issue identification, root cause analysis (RCA), and coordinated resolution with cross-functional teams. Actively contributed to on-call rotations, led post-incident reviews, and implemented corrective actions to minimize recurrence and improve system reliability.
Collaborated across teams to troubleshoot complex issues, perform service account password resets, and support vertical-specific closed stack systems.
Implemented automation scripts to enhance deployment processes and reduce downtime.
Monitored system performance, identifying issues to maintain service reliability and improve user experience.

System Intern

Cerner Corporation

Banglaore

02.2022 - 08.2022

Contributed to Hadoop ecosystem operations: Kafka, HDFS, YARN, and HBase for data pipeline reliability.
Supported staff members in their daily tasks, reducing workload burden and allowing for increased focus on higher-priority assignments.

Education

Master of Computer Application -

Vellore Institute of Technology

Chennai India

04.2022

Skills

Cloud & DevOps: AWS (EC2, S3, Lambda, RDS, EKS), Oracle Cloud, Terraform, Jenkins, GitHub Actions, CodePipeline
Infrastructure & Monitoring: Kubernetes, F5 BIG-IP, Zabbix, Splunk, New Relic, Grafana
Scripting & Tools: Python, Linux, Bash, Chef, Ansible
Others: Jira, VMware, Kafka, Hadoop (HDFS, YARN, HBase), CI/CD, DNS, Load Balancers

Certification

AWS Solution Architecture Associate

Leadership Experience

Demonstrated strong leadership by driving cross-functional collaboration between SRE, security, and application teams to implement infrastructure automation initiatives. These efforts significantly improved deployment reliability and reduced manual errors by over 40%. Additionally, mentored and supported the growth of 3+ junior engineers by guiding them in areas such as cloud infrastructure, monitoring tools (Zabbix, New Relic), and incident response processes-accelerating their onboarding and enabling faster team-wide productivity.

Timeline