Summary
Overview
Work History
Education
Skills
Websites
Certification
Leadership Initiatives
Projects Contributions
Languages
Timeline
Generic

Prasad Wagh

Pune

Summary

Results-driven Lead Site Reliability Engineer (SRE) with over 12 years of experience in designing, implementing, and maintaining scalable and resilient infrastructure. Expertise in cloud platforms, automation, monitoring, and incident management. Adept at collaborating across teams to improve system reliability, efficiency, and security. Proven track record of leading SRE teams and driving operational excellence. Strong knowledge of ISO 8583 messaging protocol, ensuring reliable and secure transaction processing. Experienced in supporting event-driven applications to enhance real-time data processing and system responsiveness, leveraging technologies such as NATS, Kafka, and gRPC.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Lead Site Reliability Engineer

MasterCard
Pune
06.2014 - Current
  • Leading a team of SREs to enhance system reliability, scalability, and performance
  • Designed and implemented highly available architectures across cloud environments
  • Improved monitoring & observability using Splunk & Dynatrace
  • Led resiliency testing and chaos engineering, uncovering hidden system weaknesses
  • Streamlined CI/CD pipelines, improving deployment frequency
  • Collaborated with cross-functional teams (engineering, testing, security) to drive SRE best practices
  • Managed incident response & postmortems, reducing MTTR by 50%
  • Ensured high availability and security of financial transaction systems using ISO 8583 messaging
  • Supported event-driven applications, ensuring low-latency message processing and high availability, leveraging NATS, Kafka, and gRPC
  • Working on OnSoil projects, implementing Multi-Region Resiliency setup to enhance system fault tolerance and disaster recovery

Site Reliability Engineer

Electra Card Services
Pune
05.2013 - 05.2014
  • Implemented automated failover and disaster recovery strategies, improving system uptime
  • Integrated logging and tracing solutions, enhancing debugging and issue resolution
  • Mentored junior engineers, fostering an SRE culture within the organization
  • Worked on ISO 8583 transaction flows, optimizing system performance and message processing
  • Worked on Acquiring Product such as Payment Gateway
  • Worked on Issuing Products such as clearing, Authorization
  • Supported Acquiring & Issuing products, ensuring seamless transaction processing and system reliability

Java Developer

Web Development Company
Pune
01.2012 - 04.2013
  • Maintained production systems, ensuring 99.99% availability
  • Developed scripts for automated remediation of incidents, reducing manual intervention

Education

B.E. - Computer Engineering

Pune University
Pune
01.2011

Skills

  • Cloud Platforms: AWS,PCF
  • Infrastructure as Code: Chef Infra
  • CI/CD & Automation: Jenkins,Bitbucket
  • Monitoring & Observability: Splunk,Dynatrace
  • Chaos Engineering & Resiliency Testing
  • Scripting & Programming: Python,Groovy,Java
  • Frameworks: Spring Boot
  • Incident Management & Response: ITSM
  • Payments & Transaction Processing: ISO 8583 Financial Messaging
  • Event-Driven Architecture: Kafka,NATS,gRPC
  • Multi-Region Resiliency & High Availability

Certification

AWS Certified Cloud Practitioner

Leadership Initiatives

  • Spearheaded SRE adoption in the organization, evangelizing best practices.
  • Conducted training sessions on product flow, incident management, and chaos engineering.
  • Collaborated with engineering and security teams to define operational standards.

Projects Contributions

  • Designed, Developed and implemented Service Operations Tool.
  • India OnSoil Product setup in record time.
  • Developed self-healing infrastructure using automation scripts, minimizing manual interventions.
  • Led a team to migrate legacy infrastructure to PCF cloud.
  • Optimized ISO 8583 transaction processing, reducing response times and improving reliability.
  • Designed and maintained event-driven applications, ensuring fault tolerance and scalability, utilizing Kafka, NATS, and gRPC.
  • Working on OnSoil projects and Multi-Region Resiliency setup to improve system redundancy and disaster recovery.

Languages

Marathi
First Language
English
Advanced (C1)
C1
Hindi
Advanced (C1)
C1

Timeline

Lead Site Reliability Engineer

MasterCard
06.2014 - Current

Site Reliability Engineer

Electra Card Services
05.2013 - 05.2014

Java Developer

Web Development Company
01.2012 - 04.2013

B.E. - Computer Engineering

Pune University
Prasad Wagh