Demonstrates strong analytical, communication, and teamwork skills, with proven ability to quickly adapt to new environments. Eager to contribute to team success and further develop professional skills. Brings positive attitude and commitment to continuous learning and growth.
Overview
16
16
years of professional experience
5
5
Certifications
Work History
Lead Site Reliability Engineer
Kyndryl Solutions private ltd.
09.2021 - Current
Key skills: Red Hat, SUSE Linux, VMware SRM, AWS Cloud, new provisioning/onboarding, Pacemaker cluster, troubleshooting, disaster recovery, Docker, CyberArk, Splunk, Dynatrace, Qualys, vulnerability management/remediation.
Conducted regular vulnerability scans to maintain up-to-date knowledge of potential threats and system weaknesses.
Worked closely with stakeholders to prioritize remediation efforts based on risk levels associated with identified vulnerabilities.
Assisted client in understanding the implications of discovered vulnerabilities, helping them make informed decisions about necessary corrective actions.
Provide Level 3 support to global client delivery teams, serving as a subject matter expert for Linux-based systems and cloud infrastructure, AWS.
Act as a point of escalation for complex issues, driving outage calls, and ensuring timely resolution to meet SLA requirements.
Played a pivotal role in AWS Proof of Concept (POC) initiatives, rigorously testing diverse features and options earmarked for significant utilization by our operations team.
Developed comprehensive documentation detailing the POC process, outcomes, and recommendations, ensuring invaluable reference material for future implementations and operations.
Participated in diverse client cutover activities, proactively addressing and resolving issues encountered during their migration journey to the cloud, ensuring minimal disruption, and smooth transitions.
Manage new server provisioning and configuration on IBM and AWS cloud platforms, optimizing performance, and cloud server recovery.
Conduct root cause analysis to identify underlying issues, and develop effective solutions to prevent recurrence.
Provide support for pacemaker cluster servers, including patching, and configuration updates to maintain system stability.
Implemented End-2-End VMware Site Recovery Manager for disaster recovery planning, including onboarding, configuration, mock testing, DR failover.
Prepare and execute implementation plans, back-out procedures, and verification plans for Change Management processes, ensuring minimal disruption to operations.
Working on dem
Ensure compliance with SLA and process requirements, driving continuous improvement initiatives to enhance customer satisfaction and achieve business objectives.
Led the decommissioning process of servers, meticulously ensuring comprehensive coverage of all components within specified timelines, thereby preventing any billing issues for the customer.
Evaluated new technologies and tools to enhance overall system performance, stability, and security.
Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
Conducted root-cause analyses after major incidents to identify areas for process improvement or technical enhancement opportunities.
Contributed to the ongoing refinement of internal processes and procedures within the site reliability engineering discipline through regular reviews, updates, and knowledge sharing activities.
Ensured compliance with relevant industry regulations regarding data privacy standards by actively participating in audits assessments.
Improved service reliability, meticulously documenting system architectures and maintenance procedures.
Enhanced system reliability by developing and implementing comprehensive monitoring solutions across all platforms.
Linux SME - Squad Leader
IBM india pvt. Ltd.
04.2016 - 08.2021
Summary: Responsible for technical support and management of IBM commercial manage hosting and IBM cloud hosting client’s production environments and end to end management, including monitoring, performance tuning, capacity planning, and maintaining overall platform health.
Provide L3 level technical support to Global customer for fault/outages also Interface with clients for fault and change management.
Responsible for handling escalated incidents and High Availability environment incident Management on IBM cloud hosting Platform.
Change Management: Preparing Implementation plan, back out and verification plan for Change records. Performing risk and impact analysis.
Routine Performance Analysis, Capacity analysis, security audit reports and review for necessary planned customer changes.
Handling a squad consisted of 9 team members, responsible for their shift schedules and support coverage.
Ensuring SLA & process compliance along with high customer satisfaction to achieve more business.
Performing Root Cause Analysis to provide solutions for chronic issues.
Senior technical Support Engineer
SunGard Global Technology – Bangalore
07.2012 - 03.2013
Summary: primary role was designed to provide level 2 support to global client delivery (US, UK), including support of all hosting platforms UNIX, Solaris window as well as application support.
Supporting multiple mission-critical application servers running various operating systems remotely.
Providing technical support for application & services.
Performing technical tasks related to the deployment and maintenance of products.
Performing day to day activities & providing timely update of completion.
Resolving technical support issues related to windows, application, system etc.
Supporting critical financial application production environment.
Data Center System Admin
Hewlett Packard Sales India Pvt. Ltd
04.2009 - 06.2012
Holds the distinction of participating in: The largest ever digitization of a policy “LIC-EDMS” wherein installation of servers took place at more than 2000 locations on a country wide basis.
Contract of HP India for the biggest
Insurance Company in India employing manpower of 11000 working across 80+ centers. This is one of the largest digitization projects in the world with scanning of 3Billion pages with software customization and implementation. The program includes subsequent training and support for 8 years.
Roles & Responsibilities:
Proactively monitor & manage data center & their production servers.
Troubleshooting related to Linux OS/hardware issues/Application issues.
Leading a team of 12 members over 5 datacenters & accountable to provide e2e tech support.
Providing support on Linux & system related day to day issue across 5 data center teams.
Efficiently handle the troubleshooting of servers, hardware software and network issues.
Designed and implemented backup procedures for the end user.
Negotiated & regular follow up with vendors in case of hardware failures & software issues.
Securing the system by limiting users, using unjustified root access.
Performing backup for database/production data weekly/quarterly on data center.
Performing housing keeping for production server’s database (vacuuming, reindexing, log purging).
Keep the machines stable and other routine system administration tasks.
Report & track customer affecting issues to customer helpdesk.
Ensuring daily the computing environment, hardware and software work fine.
Fully aware of the criticality of issue and business impact by demonstrating critical mindset.
Ensuring SLA & process compliance along with high customer satisfaction to achieve more business.
Operations Manager at IBM India Private Limited/ Kyndryl Solutions Private LimitedOperations Manager at IBM India Private Limited/ Kyndryl Solutions Private Limited