
Engineering leader with 14+ years of experience across Site Reliability Engineering (SRE), Observability, Cloud Platforms, and Enterprise Software Development, including 6+ years leading and managing teams of 8–10+ engineers. Proven expertise in building and operating large-scale logging, monitoring, and observability platforms using Splunk, Datadog, Grafana, Azure Monitor, Dynatrace, and OpenTelemetry. Strong background in Azure, AWS, and GCP, CI/CD automation, vendor management, and agile delivery. Seeking a Manager, Software Engineering – Observability role where I can drive reliable, scalable, and high-performing systems while enabling team growth and continuous improvement.
Project 1
UBS Hyderabad
Technologies
Ø Cloud & Platform: Azure Cloud, Azure Monitor, Log Analytics, Azure App Insights
Ø Observability: Splunk, Grafana, OpenTelemetry (concepts), Datadog (logs/metrics)
Ø Containers & DevOps: Kubernetes (AKS), Docker, Azure CI/CD, Git
Ø Automation & Scripting: Python, Shell
Ø OS & Infra: Enterprise Linux
Ø ITSM & Agile: ServiceNow, Jira, Agile / ITIL
Project Description:
UBS is a global financial services organization delivering wealth management, asset management, and investment banking platforms. The systems supported are mission-critical, high-availability financial and risk platforms with strict regulatory and security requirements.
Roles and responsibilities:
Ø Act as Technical Lead / Engineering Lead for observability and reliability initiatives supporting business-critical financial platforms.
Ø Lead a team of SREs by providing technical direction, task prioritization, mentoring, and day-to-day operational leadership.
Ø Designed and enhanced enterprise observability dashboards and alerts using Azure Monitor, Log Analytics, Splunk, and Grafana, improving visibility and incident response.
Ø Implemented proactive alerting strategies aligned with SLIs and SLOs, reducing incident noise and improving system reliability.
Ø Supported and monitored containerized workloads on Kubernetes (AKS), ensuring platform stability and scalability.
Ø Developed Python-based automation tools to detect, triage, and resolve recurring production issues, significantly reducing MTTR.
Ø Built and maintained Azure CI/CD pipelines, ensuring safe, repeatable, and reliable deployments.
Ø Led major incident response and root-cause analysis, driving corrective and preventive actions in collaboration with engineering and product teams.
Ø Partnered closely with product management, development, security, and vendor teams to align reliability goals with business priorities.
Ø Played a key role in vendor transition and stabilization activities, including SOP creation, documentation, and knowledge transfer.
Ø Ensured compliance with ITIL processes (Incident, Change, Problem Management) and internal audit requirements.
Ø Regularly communicated system health, risks, and improvement plans to senior stakeholders, demonstrating strong customer and service orientation.
.
Project Description:
Electrolux products sell under a variety of brand names (including its own), and are primarily major appliances and vacuum cleaners intended for home consumer use.
I have joined as an SRE to this project when it has been takenover from another vendor. I have played critical role to get the application related KTs successfully completed the project
Roles and responsibilities:
Ø Created JMeter performance testing scripts for SAP UI, identified performance bottlenecks, and collaborated with engineering teams on issue resolution.
Ø Initiated and led performance testing activities for FSM (Field Service Management) applications, successfully delivering two major performance initiatives into production.
Ø Designed and implemented Azure dashboards and proactive alerting solutions to improve system visibility and early issue detection.
Ø Provided technical leadership to a team responsible for 24x7 production support, as well as application design and development activities.
Ø Responded to application and system performance issues with rapid analysis and resolution to ensure service availability.
Ø Monitored production systems and services using Splunk, Stackdriver, Azure Application Insights, and Datadog.
Ø Acted as Subject Matter Expert (SME) for E-Commerce Catalog, Product Design & Development, and SAP C4C and FSM applications.
Ø Designed and developed automation tools to improve operational support capabilities and reduce manual effort.
Ø Ensured the support team consistently followed best practices while triaging Critical and High-priority production issues.
Ø Reviewed, prioritized, and coordinated fixes for production issues, ensuring weekly releases were delivered successfully.
Ø Actively participated in defining and executing action plans for escalated, customer-impacting, and critical issues.
Ø Mentored and guided team members to create knowledge base articles and technical documentation for key support topics.
Ø Owned post-production defects and change requests, ensuring appropriate prioritization and timely resolution.
Ø Analyzed and presented trend metrics and operational KPIs to senior management to drive continuous service improvement.
Ø Partnered with cross-functional support teams across the organization to continuously improve production stability and reliability in a 24x7 environment.
Shop & Browse
GCP, Splunk, Java, spring boot ,Cloud Spanner ,CICD, Jenkins, Azure
Devops
Project Description:
Macy’s, Inc. is one of the Nation’s premier Omni channel retailers, company operating stores and Internet websites under two brands, Macy's and Bloomingdale's. The Company sells a range of merchandise, including apparel and accessories for men, women and children, cosmetics, home furnishings and other consumer goods. We Develop and Support Website applications of Macy’s (www.macys.com) and Bloomingdales (www.bloomingdales.com)
The Shop & Browse is a web application that is responsible for the rendering of the catalog browse experience for macys.com. The goal is to deliver the faceted category browse and PDP pages to customers for the more dynamic experience .NavApp interacts with Fast Common Catalog(FCC) Services via REST calls,NavApp then processes the result with its supporting modules which include caching with Akamai.
Upon requests from users/browser, It will check at akamai then request goes to respective xapi based on url pattern. Xapi will make a call to down stream applications Services as needed. Based on the results , xapi will make a single json object and it will handed over to UI
Roles and responsibilities:
Ø Provided technical leadership to a cross-functional team responsible for 24/7 production support, website operations, and application design & development.
Ø Led rapid incident response and root cause analysis to address application and system performance issues, minimizing downtime and customer impact.
Ø Proactively monitored production systems and services using tools such as Splunk and Google Stackdriver, ensuring high availability and reliability.
Ø Acted as a Subject Matter Expert (SME) for E-Commerce Catalog systems, product design, and end-to-end application development.
Ø Designed and developed automation tools and scripts to eliminate manual operational tasks and improve support efficiency.
Ø Ensured adherence to best practices for incident triage, particularly for Critical and High-priority (P1/P2) production issues.
Ø Reviewed, prioritized, and coordinated defect fixes and enhancements, managing weekly production releases with minimal risk.
Ø Actively participated in defining and executing action plans for escalated and customer-critical issues, collaborating with stakeholders.
Ø Mentored and motivated team members to build knowledge base articles, runbooks, and technical documentation for recurring issues.
Ø Owned post-production defects and change requests, ensuring accurate prioritization and timely resolution.
Ø Analyzed and presented trend metrics, incident patterns, and SLA performance to senior management to drive continual service improvement (CSI).
Ø Partnered with multiple internal support and engineering teams to continuously improve stability, resilience, and 24/7 operational readiness of production systems
Project Description:
Macys.com is an online ecommerce application which allows the users to purchase the products online.
Ø We as a part of UFT (Unified Fast Track) are involved in the enhancement and changing activities of macys.com application.
Ø Like any other e-commerce applications, macys.com does maintain user accounts. It allows a customer to Create/Manage profiles, Checkout process, Registry, Subscription to newsletters.
Ø Keep him/her updated with the latest offers/promotional discounts.
Ø Macys.com uses email services to keep in touch with the customers.
Roles and responsibilities:
Ø Requirement Analysis and preparing documentation.
Ø Worked on enhancements in site using Java.
Ø Worked on testing of SOAP and REST web services.
Ø Developing POCs
Ø Effectively participated in bug fixing to deliver bug free Application.
Ø Coordinated with QA & Support teams.