Summary
Overview
Work History
Education
Skills
Interests
Timeline
Generic

Shreeyash Tiwari

Lead Service Availability Engineer
Noida,UP

Summary

Accomplished Lead Service Availability Engineer with a proven track record at Forcepoint, enhancing system reliability and operational efficiency through strategic automation and proactive monitoring. Expert in Google Cloud Platform and adept at fostering team collaboration, I significantly reduced MTTR and ensured 99.99% uptime, demonstrating exceptional project management and technical skills. Competent Engineering professional offering foundation in engineering project management and design. History of success in performing load and cost calculations and establishing clear parameters. Detail-oriented with strong knowledge of incident and Change Management along with strong experience with public cloud environments.

Overview

8
8
years of professional experience
5
5
years of post-secondary education

Work History

Lead Service Availability Engineer

Forcepoint
1 2023 - Current
  • Lead a team of service availability engineers to ensure 99.99% uptime across critical infrastructure and applications in [Cloud/On-Prem/Hybrid] environments.
  • Implement service reliability practices, focusing on automation and proactive monitoring, resulting in reduction in service disruptions.
  • Collaborate with cross-functional teams to drive operational excellence and continuous improvement in system performance, scalability, and disaster recovery.
  • Spearhead the design and execution of incident management processes, reducing mean time to resolution (MTTR) by through root cause analysis and swift remediation.
  • Oversee the implementation of robust monitoring and alerting frameworks using tools such as [e.g., Solarwinds, Grafana, Splunk], improving service transparency and operational efficiency.
  • Lead post-incident reviews and drive action plans to prevent recurrence, ensuring alignment with service level agreements (SLAs).
  • Champion automation efforts to streamline manual processes, reducing operational overhead.
  • Mentor junior engineers and lead training sessions to promote best practices in reliability engineering and incident management.
  • Developed automated deployment and configuration management scripts to minimize configuration drift and accelerate service delivery.
  • Collaborated with application development teams to enhance system resilience, resulting in fewer outages caused by application issues.
  • Monitored system health and performance metrics, proactively identifying bottlenecks and addressing them before they escalated into major incidents.
  • Led root cause analysis (RCA) efforts for significant incidents and implemented preventative measures, improving overall system reliability.
  • Participated in on-call rotations, providing 24/7 support for mission-critical services and driving rapid incident resolution.
  • Contributed to the design and rollout of disaster recovery strategies, ensuring business continuity across [regions/data centers].
  • Supported service availability for large-scale distributed systems, ensuring minimal downtime through vigilant monitoring and incident response.
  • Worked closely with DevOps and engineering teams to ensure seamless deployment of new services and updates, minimizing production impact.
  • Designed and implemented monitoring dashboards using [e.g., Grafana, Jira, Datadog] to improve visibility into service performance and operational health.
  • Assisted in the development and documentation of incident management protocols, ensuring consistent and effective handling of service disruptions.
  • Played a key role in post-incident analysis and follow-up actions, contributing to long-term improvements in system reliability and uptime.
  • Perform cost analysis and optimization by identifying unused resources, implementing auto-scaling, and optimizing resource allocation to reduce cloud expenditure.
  • Lead and manage a team of 10 professionals consistently meeting or exceeding performance targets in line with company goals.
  • Implement performance management systems, tracking key performance indicators (KPIs) and providing feedback and coaching to improve employee performance.
  • Address team conflicts and issues promptly, applying conflict resolution techniques to maintain a positive and productive work environment.
  • Facilitate regular team meetings and performance reviews, ensuring transparency in goals, objectives, and performance outcomes.
  • Led daily stand-ups and sprint planning for Agile projects, ensuring consistent progress and alignment between stakeholders and team members.
  • Collaborated with upper management and clients to define project scope, objectives, and deliverables, ensuring all stakeholders' expectations were met.
  • Conducted post-project reviews to identify lessons learned and implemented improvements for future projects.


Cloud Operations Specialist

Ultimate Kronos Group
08.2018 - 01.2023
  • Working on Monitoring tools like- Splunk, AppManager, Cyberark, Qradar, Solarwinds, Dynatrace.
  • Grafana to monitor cloud and hosted environments
  • Remote monitoring of Microsoft windows 2012 and Linux servers, which includes of performance, uptime, SQL
  • Deployments of Changes and Releases (Compliance, Service, Hotfixes)
  • Worked on Google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud
  • Worked on installing, configuring and managing Docker Containers, Docker Images.
  • Scripts to automatically update system components using PowerShell.
  • Worked on the configuration of Active directory accounts to Cyberark
  • Manage cloud infrastructure across ensuring the optimal performance and availability of cloud services and applications.
  • Monitor cloud environments using [Catchpoint, Splunk F5] to proactively identify and resolve issues, resulting in decrease in downtime.
  • Oversee CI/CD pipelines for deploying infrastructure and applications, ensuring seamless integration and delivery of new features across cloud environments.
  • Administered cloud environments, including resource provisioning, configuration, monitoring, and scaling across platform.
  • Managed identity and access control (IAM), implementing security policies and best practices to safeguard cloud resources.
  • Provided day-to-day cloud operations support, monitoring system health, handling incident escalations, and optimizing cloud resources.
  • Created and maintained documentation for cloud environments, standard operating procedures (SOPs), and troubleshooting guides.


System Engineer

HCL Techonologies
11.2016 - 10.2017
  • Working with technical support to resolve end users issue.
  • Working with EXXONMOBIL as a client.
  • Working with Active Directory, creating user account, membership groups, Distribution Lists and password managing.
  • Working with Microsoft office packages and troubleshooting related issues.
  • Troubleshooting Windows OS issues and resolving them.
  • Handling customers call and providing them end to end solution.
  • Creating user profile and working in admin.

Education

Bachelors Of Technology - Electrical, Electronics And Communications Engineering

Greater Noida Institute Of Technology
Greater Noida, India
08.2012 - 06.2016

Sr. Secondary Schooling - Science Education

Methodist High School
Kanpur, India
03.2011 - 03.2012

Skills

Prometheus, Grafana, ELK, Splunk, Datadog, New Relic

Google Cloud Platform

Windows and Active Directory

Incident and Change Management

Agile, Scrum, ITIL, SRE

JIRA, Confluence and ServiceNow

Project Management ( PMP )

Jenkins, GitLab CI, Docker, Kubernetes

Scrum, Kanban

Interests

Sports

Travel

Reading

Timeline

Cloud Operations Specialist

Ultimate Kronos Group
08.2018 - 01.2023

System Engineer

HCL Techonologies
11.2016 - 10.2017

Bachelors Of Technology - Electrical, Electronics And Communications Engineering

Greater Noida Institute Of Technology
08.2012 - 06.2016

Sr. Secondary Schooling - Science Education

Methodist High School
03.2011 - 03.2012

Lead Service Availability Engineer

Forcepoint
1 2023 - Current
Shreeyash TiwariLead Service Availability Engineer