Resourceful Site Reliability Engineer known for high productivity and efficient task completion. Skilled in automation, continuous integration and delivery (CI/CD), and cloud infrastructure management. Excel in problem-solving, collaboration, and adaptability, ensuring seamless operations and system reliability. Highly skilled Site Reliability Engineer with hands-on experience in designing, coding, testing and supporting next-gen database solutions in Oracle enterprise and SQL Server environments. Proficient at developing large scale software systems, maintaining high server uptimes and responding swiftly to outages or interruptions. Consistently enabled smoother deployments and monitoring of applications across different platforms by implementing automation tools. Demonstrated leadership skills while coordinating with cross-functional teams to ensure system efficiency and reliability.
Working with Ansible; supervising configuration management automation tool Ansible and working on integrating Ansible YAML Scripts
• Managing administration & mainframe monitoring administration; gaining experience in AWS, GIT, Ansible, Docker, Jenkins, Container, Docker Swarm, Kubernetes.
• Supervising VMware and cloud environment, working on incidents and problem tickets; stabilizing the environment by proactively fixing hardware, VCenter errors
• Automating routine activities by scripting using PowerShell scripts; upgrading and patching ESXi, VCenter to the latest release and patch
• Planning and implementing new setup based on the customer needs; working with Hardware and Software vendors to fix the environmental issues and bugs
• Providing training the junior team members to do advanced troubleshooting; supporting 24x7 to maintain the environment to function at agreed SLAs
• Heading datacenter consolidation by doing P2V migrations using Platespin and VMware converter; maintaining up-to-date information of the VMware environment
• Generating reports of esxi hosts and working on fixing it proactively; coordinating with change team to implement changes to keep the environment at the desired patch level
• Attending technical sessions to get upskilled to deliver to the organizational changing requirements; suggesting and implementing configuration improvements for tuning up
• Identifying the root cause for the issues and suggesting the workaround and bug fixes for the same to the Development Team; using Terraform to manage the infrastructure
• Supporting and monitoring the existing infrastructure, supervising preventative maintenance and backup as well as performed other regular support activities to ensure effectiveness
• Leading the designing and documentation of infrastructure processes, procedures and standards along with the maintenance and preparation of system and software documentation
• Developing cost estimates and recommending systems development as well as upgradations to existing systems; evaluating infrastructure services equipment and software for purchase
• Ensuring proper communication between L1 and L2 teams and observed the resolution to escalations within the given Service Level Agreements (SLA's)
• Supervising high-severity incidents to ensure service availability with minimal delay and impact towards ensuring smooth operations of an infrastructure environment