Summary
Overview
Work History
Education
Skills
Accomplishments
Awards Honours
Previous Role
Certification
Timeline
Generic

Venkata Harish Kumar Mamidi

Summary

Senior Site Reliability Engineer with 5.8 years of experience in reliability engineering, production stability, observability, performance engineering, and automation. Proven track record of maintaining 99.9% service availability through proactive monitoring, incident management, root cause analysis, and reliability engineering best practices. Experienced in Linux, OpenShift, Datadog, Jenkins, GitHub, and Python automation, with expertise in SLOs, SLIs, Error Budgets, CI/CD pipelines, and L3 production support. Skilled in driving operational excellence, reducing manual toil through automation, and improving system reliability, scalability, performance, and operational efficiency in enterprise environments.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Senior Engineer

Qualitest
Hyderabad
05.2022 - Current
  • Maintained 99.9% service availability across Linux-based production environments through proactive monitoring, incident management, automated remediation, and reliability engineering best practices.
  • Defined and governed Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to improve service reliability and operational transparency.
  • Enhanced CI/CD processes using Jenkins and GitHub, streamlining build, test, and deployment workflows to improve release efficiency and deployment consistency.
  • Designed and implemented Datadog observability solutions, including dashboards, monitors, and alerting frameworks, enabling proactive detection of performance degradation and service-impacting incidents.
  • Developed Python and Bash automation solutions for operational workflows, health checks, log analysis, and capacity planning, reducing manual effort by approximately 80%.
  • Optimized application and infrastructure performance through analysis and tuning of CPU, memory, I/O, database, and network resources, improving scalability and system responsiveness.
  • Led critical incident response activities, coordinated cross-functional resolution efforts, and conducted blameless postmortems to identify root causes and drive preventive improvements.
  • Leveraged Datadog and observability platforms to analyze application behavior, identify bottlenecks, and improve production reliability and user experience.

Performance Tester

Cognizant
Chennai
09.2020 - 05.2022
  • Developed scripts for business scenarios using LoadRunner TruClient and web service protocols.
  • Executed tests to trace backend calls for corresponding frontend transactions.
  • Performed baseline load and endurance testing to assess application performance.
  • Monitored batch job runs and highlighted performance issues to stakeholders.
  • Compiled reports that compared performance metrics against previous load test results to identify trends and improvements.
  • Gathered requirements and prepared scenario documents to guide performance testing efforts.
  • Analyzed non-functional requirements and data flow within partner systems.
  • Participated in performance engineering, tracked defects, and prepared comprehensive test closure reports.

Education

Bachelor Of Engineering - Computer Science and Engineering

SCSVMV University
Kanchipuram
05-2020

Skills

  • Cloud & Infrastructure:
    Amazon Web Services (AWS), On-Premises Linux Server Administration
  • Containerization & Orchestration:
    Red Hat OpenShift (OCP), Helm, Horizontal Pod Autoscaling (HPA), Container Resource Optimization
  • Site Reliability Engineering:
    Service Level Objectives (SLOs), Service Level Indicators (SLIs), SLA, Error Budgets, Incident Management, Capacity Planning, High Availability (HA), Production Root Cause Analysis (RCA), Postmortems, L3 Production Support
  • Programming & Scripting:
    Python, Bash, Shell Scripting, Korn Shell (KSH), Java, JavaScript, C, C, PHP
  • Observability & Monitoring:
    Datadog, Dynatrace, Grafana, Sumo Logic
  • Performance & Reliability Engineering:Performance Engineering, LoadRunner Enterprise (LRE), API Testing, Postman
  • CI/CD & Automation:
    CI/CD Pipelines, Jenkins, Git, GitHub, Infrastructure Automation, Release Management
  • Performance & Reliability Engineering:
    Performance Engineering, LoadRunner Enterprise (LRE), API Testing, Postman
  • Operating Systems:
    Linux, Unix
  • Tools & Technologies:
    Jira, SoapUI, CA DevTest Lisa, HTML, CSS, ReactJS

Accomplishments

  • Architected comprehensive observability frameworks using Datadog, establishing SLOs, SLIs, and Error Budgets to proactively detect service degradations, monitor system health, and reduce Mean Time to Detect (MTTD) to approximately 0.2 seconds.
  • Led critical incident response and service restoration efforts for production environments, performing Root Cause Analysis (RCA), coordinating cross-functional resolution, and driving postmortem actions to prevent recurring incidents.
  • Mentored cross-functional teams across Development, QA, Product, and Infrastructure, promoting SRE best practices while improving release readiness and operational excellence.
  • Spearheaded end-to-end performance engineering initiatives using LoadRunner Enterprise and batch testing, by identifying infrastructure bottlenecks, and optimizing application and API performance, and ensuring production readiness for high-traffic deployments.

Awards Honours

  • QSTAR Award for Reliability Excellence | QualiTest
    Awarded for exceptional engineering contributions and driving continuous reliability improvements within the DCI infrastructure space.
  • Year-Over-Year Client Commendations | Discover Financial Services & Capital One
    Received consistent annual recognition from executive stakeholders for exceptional efforts in the SRE domain, specifically for optimizing enterprise infrastructure and improving system resiliency.
  • Cheers Award: Enthusiastic Team Player | Cognizant (June 2021)
    Recognized for outstanding cross-functional collaboration, mentorship, and dedication to project success during critical enterprise performance engineering initiatives.
  • Best Project Award | Cognizant Student Club (2018–2019)
    Awarded for technical excellence and innovative problem-solving on a foundational student engineering project.

Previous Role

Programmer Analyst – Performance Engineering | Cognizant
Project: The Hartford (Personal Lines Insurance) | Dec 2020 – May 2022

  • Collaborated with stakeholders to gather non-functional requirements and define performance testing strategies for enterprise insurance applications.
  • Developed and maintained LoadRunner (TruClient, Web HTTP/HTML) scripts and validated REST/SOAP services using SoapUI to simulate production workloads.
  • Performed transaction tracing and backend analysis to identify performance bottlenecks across application, middleware, and database layers.
  • Executed baseline, load, stress, endurance, and batch-processing tests to validate application scalability, stability, and production readiness.
  • Supported application migration initiatives by conducting performance assessments and validating system behavior post-migration.
  • Analyzed key performance metrics, identified optimization opportunities, and collaborated with development, infrastructure, and database teams to improve application performance.
  • Contributed to performance engineering initiatives through capacity analysis, workload modeling, performance tuning, defect tracking, and test reporting.

Certification

  • Jenkins for DevOps: Practical Uses of Jenkins
  • AWS Cloud Practitioner: Cloud Architecture Design Principles
  • AWS Developer Associate
  • Datadog: Site Reliability Engineer
  • CI/CD Implementation for DevOps
  • Datadog: Developer
  • Neoload Professional Certification

Timeline

Senior Engineer

Qualitest
05.2022 - Current

Performance Tester

Cognizant
09.2020 - 05.2022

Bachelor Of Engineering - Computer Science and Engineering

SCSVMV University
Venkata Harish Kumar Mamidi