Dynamic Technical Lead-SRE at HCL Technologies with expertise in Kubernetes management and incident resolution. Proven track record in mentoring teams and optimizing cloud infrastructure costs. Adept at implementing CI/CD pipelines and fostering agile methodologies, driving significant improvements in project delivery and application uptime. Passionate about technical documentation and team leadership.
Overview
10
10
years of professional experience
Work History
Technical Lead-SRE
HCL Technologies
Bangalore
07.2022 - Current
Working as Technical Lead-SRE for Verizon IOT Project
Take part with client meetings to understand the requirement and align the team to work towards it
Planning the infrastructure, along with R&D and development team to move from Monolith to Microservices.
Onboarding the new verticals and performing the Table Top Excercise, Documentation before going live.
Hiring new candidates according to skill set requirements. Mentoring and training the new candidates wrtr project requirement
Working on Agile sprint tasks and making sure they are updated with the proper status on a daily basis, and closed within the sprint.
SME for Applications, on boarding them from Development team, creating checklist and performing table top exercise, creating/documenting technical details, Escalation paths & SOP for troubleshooting
Using api monitoring to monitor the uptime and availability of applications, work on issues with help of Splunk and performing the initial investigation and troubleshooting
Open bridge for TRIAGE issues as per incident management and involving concerned team to fix it
Post resolving documenting the RCA, creating outage forms and Postmortem reports
Hold review meetings with management & stakeholders for the monthly triage incidents and discuss on the RCA, preventive actions
Creating helm charts for applications and upgrading/maintaining the existing applications
Involving in weekly releases of new versions for the microservices and supporting them in finalizing the checklists, deploying them in all environments and helping to validate
Updating the secrets, ingress creation and configuring alerts for the kubernetes clusters
Creating dashboards and upgrading existing grafana panels
Working on EKS cluster upgrade, working with Platform team on the AMI patching of the Kubernetes nodes
Upgrading the current monitoring infrastructure images and deploying them through HELM
Working on Cost Optimization of the Infrastructure based upon the usage of applications, coordinating with the Development team and making changes as discussed and sharing monthly cost usage reports of various verticals
Maintaining access of various third party and monitoring tools in the organization
Making necessary changes in the stash upon access requests post getting approval and business justification.
Devops Engineer
IBM PVT LTD
11.2020 - 03.2022
Company Overview: Contractor
Worked as a Devops Engineer at IBM COS Loadbalancer Team
Worked on Git and Jira issues with respect the loadbalancer tasks like certificate renewal, service group creations
Creation of Grafana Dashboard to monitor the metrics both manually and via ansible automation
Creating dev environments locally and testing the code changes and verify they meet the requirements
Participate on weekly meetings and sharing views and inputs on the current project blockers and providing suggestions if needed
Worked with Jenkins under the CI/CD pipeline providing continuous improvement to agile s/w development teams
Setting up of Kubernetes cluster, certificate renewal of the components
Troubleshooting issues on the Kubernetes cluster with respect to deployments and providing the relevant log details to Developers for further debugging and fix
Take part in release of new versions of microservices via Spinnaker/ Jenkins, working along with QA to validate the smoke tests
Manual release and rollback of application versions on Kubernetes Cluster
Basic Knowledge of HELM Charts
Worked on CEPH storage application, operations and setting up monitoring for the same in Zabbix
Experience in Amazon Web Services like EC2, Load Balancers, Auto Scaling, S3, VPC, Cloud Watch and IAM
AWS cleanup, Cost optimization & Effective resource utilization by implementing right resource for right usage
Implemented cronjobs for S3bucket backups and RDS backups
Created logging setup using Logstash and integrated superset cluster with application for graph visualization
Contractor
Product Support for Language Translation Product
SDL Technologies India Private Limited
04.2017 - 11.2020
Company Overview: RWS Group
Creating pipelines in Spinnaker and configuring the Cluster Loadbalancer for the containerized applications
Setting up Zabbix Monitoring system for our Infra from scratch, adding the required Windows /Linux services/ servers and tools which be monitored as per global/agreed thresholds
Setting up of Kubernetes cluster, certificate renewal of the components, editing Take up incidents and requests logged by Customer Support and Level 1 engineers, triage and fix them with approved Operational Runbooks ensuring proper SLA
Sharing RCA’s for outages with the SDM’s to generate KPI’s and other audit related documents
Raising Standard, Normal and Emergency Change Requests and getting them approved in CAB and ECAB
Logging Problem Tickets for recurring issues, provide a permanent fix and co-ordinate with next Level R&D team for assistance and suggestions as per the OLA
Documenting technical investigations and Knowledge Base on CONFLUENCE
Deployment of New VM’s, Creation of Templates and network configurations in VSphere
Good knowledge on DRS, HA Clusters, FT, vMotion, Datastores
VM Snapshots, vSwitch& DV Switch
Installation / Renewal of SSL certificates
Creating VIP in NetScaler and binding them with server/service groups, attaching SSL certificate and configuring redirection
Validating and performing quality assurance to check the configured parameters of the Applications hosted on the newly build servers prior bringing them live
Performing application upgrades of products hosted
Restoring back to previous running state in case of any failure with help of DB backups and VM snapshot
Analyzing the errors logs and passing the same to Development team with our inputs
Working experience of configuration management tools such as Ansible
Working with vendors and creating POC’s for adoption of new tools or service in operations
Proficient in AWS services like VPC, EC2, S3, ELB, Autoscaling Groups (ASG), Route 53, CloudWatch, IAM etc
Experience in Amazon Web Services like EC2, Load Balancers, Auto Scaling, S3, VPC, Cloud Watch and IAM
Configuration of Chef Infra
Bootstrapping the hosts/nodes of the new machines, editing the
And performing Chef-Automation
Using Intelli-J and GIT CLI to edit the source codes and committing them to the central stash repository
Performing releases, rolling upgrades and container services rollbacks
Experience in checking code quality result by using Sonarqube
Pushing the deployments via Spinnaker, Jenkins and Teamcity
Creating CI/ CD pipeline for automating deployments and other jobs
Configured and maintained Jenkins / Terraform to implement the CI process and have knowledge on integrating the tool Maven to schedule the builds
Writing deployments, pods, and services yaml, configuring security and network policies
Troubleshoot Application failures Control Plane failures in Kubernetes cluster
Participate in Monthly 24/7 On-Call Support for Production issues
Integration of critical alerts to PagerDuty
Experience working onsite in Romania for migration of servers in private cloud to public cloud
Which includes setting up of Kubernetes clusters for hosting the containerized applications
Installing and configuring of various third party logging, metrics and messaging ques such as Eureka, RabbitMQ and Elastic Search
Performing MongoDB installation, cluster server upgrades, sync and tasks related to changing the sever specs
Installation of AppDynamics to monitor our Kubernetes Cluster in various environments
Working together on Validation checks with Development team for migrations of each environment (UAT/ PTE/ STG / PRD)
RWS Group
System Engineer
Appnomic Systems
01.2015 - 04.2017
Provided 1st line of application support to Banking Product
Proactive monitoring of Services & Network Infrastructure via Nagios, PRTG, Zabbix and Status Cake
Logging tickets for the alerts thrown by monitoring tools and perform first point investigation and troubleshooting, escalate to second line engineers with all the findings updated in the ticket
ETL Lead /Onshore Technical Business Analyst at HCL Technologies India & HCL Technologies NZ LtdETL Lead /Onshore Technical Business Analyst at HCL Technologies India & HCL Technologies NZ Ltd