Summary
Overview
Work History
Education
Skills
Timeline
Generic
SUMIT SAURAV

SUMIT SAURAV

Site Relaibility Engneer
Pune

Summary

Over 8+ years of experience as an SRE(Site Reliability Engineer) (Virtusa) in Public Cloud operating strategy in various environments of Linux servers along with adopting cloud strategies based on Amazon Web Services and DevOps. Sustain a server-less environment by utilizing the AWS Cloud (EC2,Auto Scaling,Load balancer , S3,EFS,Glacier,AWS backup,VPC,IAM,CloudWatch, Linux,CloudFormation,etc.) and MS Azure(VM,AzureMonitor,Storage accounts,Scale sets,Load balancer) Document system configurations, Instance, OS, and AMI build practices, backup procedures, troubleshooting guides, and keep infrastructure and architecture drawings current with changes. Utilized CloudWatch ,Trusted advisor,CloudTrail ,GuardDuty,Inspector to monitor resources such as EC2, CPU memory, EBS volumes; to set alarms using SNS for notification or automated actions; and to monitor logs for a better understanding and operation of the system. Define branching, tagging,labeling and merge strategies in Git and Azure repo. Ability to work with Docker container and create docker images from docker file. Used ansible as configuration management tool to handle repetitive task and mange infrastructure. Experience in defining & implementing procedures, best practice and service standards for business excellence including change management, incident management, problem Management.

Overview

9
9
years of professional experience
4
4
years of post-secondary education

Work History

Site Reliability Engineer

Virtusa
05.2022 - Current
  • Primary on-call engineer for escalations and customer point of contact
  • Maintaining github repository for deployments and achieve sustainable environment
  • All changes and issues are ticketed in Jira for assignment and tracking
  • Actively using Splunk/Grafana/Pingdom/ELK to gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Using rundeck and control-next tools for job trigger
  • Troubleshoot high load,CPU,memory,disk usage using splunk
  • Application deployment on production and non-production environment by following release and change management process
  • Setup new deployment from scratch to host customer/applications with
  • Track down defects and come up with innovative solutions to improve reliability and availability
  • Work with product engineering team to resolve trouble tickets, developing and running scripts, and troubleshooting services in hosted environment.
  • Recorded daily events and activities in site diary to evaluate process and improve productivity
  • Developed SOPs and change controls pertaining to setup, maintenance and operation of equipment, facility and utilities

Cloud Engineer L2 and Devops Engineer

Rackspace Technology
01.2021 - 04.2022
  • (DevOps)
  • Installing, Configured and management in Ansible Centralized Server and creating playbooks to support various application server
  • Automated configuration management and deployments using Ansible playbooks and Yaml for resource declaration
  • And creating roles and updating Playbooks to provision servers by using Ansible User and Group Management, Managing
  • Permissions
  • Maintained Git workflows for version control and source code management
  • Write terraform scripts from scratch for building Dev, Staging, Prod and DR AWS environments
  • Provisioned servers and deployed features using Terraform and Ansible
  • Hands-on knowledge of docker containerization platform
  • Jenkins integration with Git ,docker and ansible
  • Roles & Responsibilities :( Public Cloud Operations AWS )
  • Ability to manage servers using System Manager (SSM) for patch management and troubleshooting of servers
  • Setup VPC environment and maintain backup strategy
  • Ability to estimate usage costs and experience in monitoring and auditing systems using cloudwatch and cloudtrail
  • Create/Managing buckets on S3 (CLI) and store logs backup
  • Creating/Managing AMI/Snapshots/Volumes, Upgrade/downgrade AWS resources (CPU, Memory, EBS)

Cloud Administrator

HCL Technologies Ltd
01.2015 - 01.2021
  • Roles & Responsibilities :( Public Cloud Operations AWS )
  • Build and release Ec2 instances Amazon Linux, Redhat and Windows for POC, Development and Production environment and configuring the servers as per request
  • Managing VPC, security groups,Private and public subnets, Route Table, Internet gateway NAT Gateway, Peering
  • Connection
  • Managed Autoscaling during high traffic and release activity(updating launch configurations)
  • Configuration Identity & Access Management (IAM) Users, Groups, Permission, Polices, Roles
  • Dynamic website building using S3
  • Monitored and worked on alerts send by Athena on various issues related to server availability, diskissues, CPU, memory
  • EBS, RDS, NACL, etc
  • Configured and reports related to Cloud watch, Guardduty, Inspector
  • Utilize EBS to store persistent data and mitigate failure by using snapshots
  • Fetching reports from AWS config and trusted advisor
  • Production assistance on 24
  • 7 basis
  • Roles & Responsibilities :(MS Azure)
  • Resource Group Management, Implementing and managing Azure networking
  • Implementing virtual machines, Planning and implementing storage, backup, and recovery services
  • Azure PowerShell: Server creation, Image creation, Copying image, Snapshot Creation
  • Configuring NSGs
  • Worked on CDN and traffic manager
  • Configuring ELBs and VM scale sets, azure files in MS azure
  • Configuring Backup in Azure for taking backup of VM and application on that
  • Roles & Responsibilities :(Linux Administration)
  • Linux patching/upgradation activities using ansible
  • Managing Adhoc commands and tasks and configuration management
  • Worked on different types of modules including AWS cloud modules
  • Software installation through RPM and YUM
  • Controlling of ACL and Services/Daemons
  • File system management, LVM, SWAP Management
  • Provisioned load balancer, auto-scaling group and launch configuration for micro services using Ansible
  • Installation, configuration of NFS/Samba/FTP/SFTP.
  • Maximized system availability through development and testing of contingency plans.

Education

B.Tech - ECE

Yamuna Institute of Engg. And Technology (Kurukshetra University)
Haryana
05.2010 - 04.2014

Skills

SNS,SQS,Lambda,LoadBalncers,ASG, SSMundefined

Timeline

Site Reliability Engineer

Virtusa
05.2022 - Current

Cloud Engineer L2 and Devops Engineer

Rackspace Technology
01.2021 - 04.2022

Cloud Administrator

HCL Technologies Ltd
01.2015 - 01.2021

B.Tech - ECE

Yamuna Institute of Engg. And Technology (Kurukshetra University)
05.2010 - 04.2014
SUMIT SAURAVSite Relaibility Engneer