Dynamic Site Reliability Engineer with extensive experience at CITI Bank, adept at deploying APM tools like AppDynamics and automating CI/CD processes with Jenkins. Proven track record in enhancing application performance and reducing incident noise through effective collaboration and innovative problem-solving. Skilled in AWS services and project management, driving operational excellence.
Overview
17
17
years of professional experience
1
1
Certification
Work History
Site Reliability Engineer
CITI Bank
Pune
03.2024 - 04.2025
Deployed APM tool (AppDynamics / NewRelic / Grafana) to monitor application performance, created customize dashboard which allow Diagnose and eliminate problems quickly
Worked on application Support teams to understand current incident queue and work on reduce noise (repeated incidents)
Closely Worked with Development team to Automate Pre and Post release tasks to avoid any issues with release worked with Business Analyst to calculate error budget, worked on defining SLA's , SLO and SLI for applications
Worked on Automation to reduce TOIL (repetitive tasks) in python
Troubleshoot Production issues, involved in root cause analysis process and documentation
Work with development team to automate releases in CI/CD tools like Jenkins
SRE / Devops Engineer
HCL Tech
Pune
01.2022 - 03.2024
Good understanding of AWS Services like EC2 , VPC, CloudWatch, VPC, AMI, RDS, EBS, Cloud Watch, Cloud Front, IAM, S3
Managing the complete AWS life cycle, along with security, provisioning and automation
Deployed APM tool (AppDynamics) to monitor application performance, created customize dashboard which allow Diagnose and eliminate problems quickly
Worked on application migration tasks from On-premise to AWS Cloud environment
Closely Worked with Development team to Automate manual deployment tasks with help of CI/CD tools like Jenkins, writing groovy based scripts in Jenkins pipeline worked with Development and Testing Teams to plan and execute major releases
Deployed microservices on docker and kubernetes platform worked on service recovery plans in case of any disaster, involve in major incidents and post remediation calls
Troubleshoot Production issues, involved in root cause analysis process and documentation
DevOps Engineer
Barclays Technology Solutions
Pune
03.2020 - 03.2022
Implemented AWS Cloud platform and its features which includes EC2, VPC, AMI, RDS, EBS, Cloud Watch, Cloud Front, IAM, S3
Worked on container orchestration like EC2 Container Service, Kubernetes, worked with Ansible
Managed Docker orchestration and Docker containerization using Kubernetes
Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers
Deployed APM tool (AppDynamics) to monitor application performance, created customize dashboard which allow Diagnose and eliminate problems quickly closely
Worked with Development team to Automate manual deployment tasks with help of CI/CD tools like Jenkins, writing groovy based scripts in Jenkins pipe
Worked with Infra teams like Linux, Oracle BAs to migrate legacy applications from on premise to AWS platform
Deployed microservices on docker and Kubernetes platform
Setup GIT repository, build tool (Maven) Plugins in Jenkins to Automate deployments with Jenkins pipeline
Automated infrastructure provisioning and configuration tasks with Ansible
Installed and configure AppDynamics to monitor application / services performance
Designed custom dashboards in AppDynamics as per requirement
Worked on service recovery plans in case of any disaster, involve in major incidents and post remediation calls
Used Service now as ITSM tools to analyze reoccurring incidents and worked with Development teams to Prioritize fix to reduce incident count
Support Analyst
Barclays Technology Solutions
03.2015 - 03.2020
Worked as application Support analyst for group of the Wealth portfolio management applications
Automate application deployments in Jenkins
Automate daily manual tasks like oracle DB reports, Linux servers housekeeping tasks with Shell scripting
Coordinate day-to-day execution of the processes to enable effective monitoring in AppDynamics, control and support of service delivery
Coordinate planning, build test and deployment of major releases
Review change requests and asses the change impact, Communicate change impact to all stake holders
Drive Major incident communication and process, provide real time updates of scope, risk and urgency associated with major incidents
Coordinate disaster recovery plan documentation, DR objectives approval and facilitation of annual disaster recovery exercises with issue logging and reporting
Troubleshooting and resolving application issues escalated from end users
Actively participating in daily, weekly, and bi-monthly status meetings
Generate statistical trend and root cause analysis report for presentation to management Team
Involve in Automating repetitive tasks and document it and on board the tasks to L1 helpdesk
Application Support Analyst
Inautix Technology Solutions
09.2013 - 03.2015
Working on EAGLE Application services, monitoring and troubleshooting incidents to minimize business interruptions, involved in implementation of any changes to application through change management
Responsible for monitoring, adding and modifying batch job schedules in order to minimize business interruptions
Automation using Shell scripting / Python scripting
Responsible for monitoring ticketing queue in BMC Remedy and work accordingly on incidents, route out of scope incidents to respective teams
Represent team in weekly CAB Meetings for weekend production migrations; verify change requests raised by developers
Building harvest packages to move code from lower environment to higher environments
Work on weekly dashboard report and send it to management which help to analyze team performance
Involved in resource management activities and recruitment process
Application Support Analyst
Cognizant Technology Services
04.2008 - 09.2013
Working with a range of Investment banking and wealth services for compliance suit of applications, this role involves providing Level 1 and Level 2 Support that involves investigating and troubleshooting issues, workarounds, suggesting improvements and solutions to fix the issues
Interact with Development team for problems that require permanent fix and provide solutions/suggestion whenever required
Customizing Shell Scripts on Linux Servers as per business Requirements, automate tasks and reports to reduce resolution time of repetitive incidents
Execute/participate in Incident, Change and problem management