Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic
Sourav Jha

Sourav Jha

Bangalore

Summary

Principal Cloud Platform Engineer and Site Reliability Engineer with 9.7+ years of experience designing, operating, and scaling large-scale, high-availability production systems across Microsoft Azure, Google Cloud Platform, and AWS. Proven expertise in SRE principles (SLO, SLI, SLA), Kubernetes-based platforms, Infrastructure as Code, CI/CD automation, and deep Linux and networking fundamentals.

Strong background in cloud platform engineering, DevOps automation, and reliability ownership, including incident management, root cause analysis, MTTR reduction, and observability. Experienced in building secure, resilient, and compliant cloud architectures using policy-as-code, defense-in-depth security models, and governance frameworks. Demonstrated ability to lead DevOps and SRE initiatives, mentor engineers, and collaborate with cross-functional stakeholders to deliver reliability-first, scalable, and cost-optimized platforms.

Overview

10
10
years of professional experience
1
1
Certification

Work History

System Engineer III

Tesco
04.2025 - Current
  • Serve as Principal Cloud Platform Engineer for Cloud Platform team responsible for architecture, reliability, security, and operations of enterprise-scale Azure cloud platforms supporting global workloads.
  • Act as both Site Reliability Engineer and Cloud Developer, enabling compute, networking, security, Kubernetes, and AI-enabled cloud services in production environments.
  • Design, build, and operate Azure-native infrastructure including VMs, VNets, AKS, Load Balancers, Private Endpoints, Virtual WAN (vWAN), and DNS, ensuring high availability, scalability, and resilience.
  • Own reliability and availability of business-critical, high-traffic cloud platforms, proactively addressing risks before customer impact.
  • Define and implement SLOs, SLIs, and alerting strategies to monitor latency, error rates, saturation, and availability across distributed systems.
  • Drive SRE best practices, including incident response, on-call operations, root cause analysis (RCA), post-incident reviews, and long-term reliability improvements.
  • Implement defense-in-depth security architectures using Azure Policy, Microsoft Defender for Cloud, RBAC, and industry-aligned security benchmarks to reduce vulnerabilities and enforce secure-by-default deployments.
  • Enforce Policy-as-Code and cloud governance frameworks, ensuring compliance, standardization, and audit readiness across environments.
  • Manage and support production Kubernetes platforms (AKS), including cluster hardening, secure workload identity, autoscaling, and controlled application rollouts.
  • Automate infrastructure provisioning and configuration using Terraform and Python, reducing manual operations, configuration drift, and operational risk.
  • Design and optimize secure cloud networking architectures (private connectivity, load balancing, DNS, segmentation) to support compliant and scalable workloads.
  • Implement end-to-end observability (metrics, logs, alerts) to improve visibility, reduce MTTR, and enhance operational decision-making.
  • Collaborate with application teams, security, and platform stakeholders to ensure reliability-first system design and secure application delivery.
  • Mentor engineers and contribute to runbook, SOPs, architecture documentation, and operational best practices.

Senior DevOps Engineer

Luxoft
06.2022 - 04.2025
  • Led DevOps and Site Reliability Engineering initiatives for application teams, owning end-to-end cloud infrastructure, CI/CD processes, and production automation for enterprise applications.
  • Acted as SDE III for a large-scale retail client Tesco, supporting engineering teams with reliable, secure, and scalable cloud platforms.
  • Designed and built CI/CD pipelines with automated testing, security scans, and infrastructure provisioning, improving deployment reliability and release confidence.
  • Established monitoring and alerting frameworks to ensure SLA compliance, proactively detecting issues and reducing incident recurrence.
  • Drove cloud and governance initiatives, aligning infrastructure and delivery pipelines with automation, compliance and security standards.
  • Guided teams through cloud migration and modernization efforts, ensuring minimal production disruption and improved platform resilience.
  • Automated infrastructure provisioning and configuration using Infrastructure as Code principles, reducing manual effort and operational risk.
  • Partnered closely with Product Managers, Developers, Architects, and Security teams to ensure smooth delivery and alignment with business requirements.
  • Contributed to architecture decisions, cost-optimization strategies, and reliability improvements, balancing performance, security, and efficiency.
  • Provided hands-on production support and incident resolution, applying SRE principles and root cause analysis to prevent repeat issues.
  • Worked on-site for approximately two years in Warsaw Poland with Central Europe team, enabling close collaboration with client stakeholders and faster decision-making.

Senior Cloud Analyst

Accenture
01.2021 - 06.2022
  • Worked as a Senior Cloud Infrastructure Analyst supporting large-scale enterprise production systems for E-Commerce client H&M across public cloud environments.
  • Engineered and optimized CI/CD pipelines using Azure DevOps, and Ansible, improving deployment speed and reducing release failures.
  • Supported Kubernetes-based platforms across cloud environments, ensuring high availability, scalability, and performance for containerized workloads.
  • Established SRE practices including proactive monitoring, alerting, and incident response, ensuring high availability and SLA adherence.
  • Designed and optimized cloud networking components to improve application availability and performance.
  • Led cloud migration and modernization initiatives, ensuring minimal disruption to business-critical workloads.
  • Responded to production incidents, performing root cause analysis (RCA) and implementing preventive solutions to reduce recurrence.
  • Authored technical documentation, runbook, and operational procedures, improving support efficiency and knowledge sharing.
  • Collaborated with stakeholders to align infrastructure solutions with business, security, and scalability requirements.

Senior Software Engineer

Think Future Technologies
10.2020 - 12.2020
  • Designed and implemented end-to-end CI/CD pipelines using Azure DevOps, and Git, enabling automated build, test, and deployment workflows for microservices-based applications.
  • Containerized applications using Docker, standardizing runtime environments and reducing deployment inconsistencies across environments.
  • Deployed and supported Kubernetes-based workloads, managing manifests, services, ingress configurations, and rollout strategies.
  • Automated infrastructure provisioning and configuration using Terraform and shell scripting, improving repeatability and reducing manual setup errors.
  • Integrated artifact repositories into CI/CD pipelines to manage versioned application builds and dependencies.
  • Implemented deployment strategies to minimize downtime during releases.
  • Supported production and non-production environments, troubleshooting deployment failures, container crashes, and pipeline issues.
  • Implemented basic monitoring and logging to track application health, resource utilization, and deployment success.
  • Collaborated closely with developers and QA teams to streamline release cycles and resolve environment-related blockers.

Technical Specialist

IBM
07.2016 - 10.2020
  • Supported Linux-based production systems, performing system administration, monitoring, and troubleshooting activities.
  • Assisted in server provisioning, configuration, patching, and performance tuning across environments.
  • Wrote basic shell scripts to automate repetitive operational tasks and improve efficiency.
  • Supported infrastructure components such as web servers, databases, and middleware services, ensuring uptime and stability.
  • Participated in incident response and root cause investigations, escalating issues and implementing corrective actions.
  • Built strong foundational knowledge in operating systems, networking concepts, and system reliability, forming the base for advanced DevOps and SRE roles.

Education

B.Tech - Computer Science & Engineering

Galgotia University
Greater Noida, India
06-2016

Skills

    Cloud: Microsoft Azure, Google Cloud Platform (GCP), AWS, VMware
    Architecture: Multi-Cloud, Hybrid Cloud, High Availability, Scalability, Disaster Recovery
    SRE: SLO, SLI, SLA, Error Budgets, Incident Management, On-Call, RCA, Post-Mortems, MTTR, Capacity Planning, High-Traffic Systems
    DevOps & CI/CD: CI/CD Pipeline Design, Jenkins, GitHub Actions, Azure DevOps, Cloud Build, GitOps, Release Engineering
    Containers: Kubernetes (AKS, GKE, EKS), Docker, Helm, Kustomize, Microservices, Ingress, Autoscaling, Rolling Deployments, Workload Identity
    IaC & Automation: Terraform, Ansible, ARM Templates, Policy-as-Code, Governance Automation, Python, Bash, Shell
    Observability: Prometheus, Grafana, Azure Monitor, Cloud Monitoring, Splunk, New Relic, Metrics, Logs, Alerts, Latency Monitoring
    Networking & Security: Linux, TCP/IP, DNS, HTTP(S), Load Balancing, VPC/VNet, Hub-Spoke, Private Connectivity, IAM, RBAC, Secrets, Zero Trust, ISO, SOC, GDPR
    Leadership: Technical Leadership, Mentoring, Stakeholder Communication, Architecture Reviews, Documentation, Agile, Scrum, DevOps Culture

Certification

  • AWS Certified Solutions Architect
  • Microsoft Azure DevOps Solutions Expert
  • Microsoft Azure Administrator
  • AWS Cloud Practitioner
  • ITIL® 4 Foundation

Languages

English
Bilingual or Proficient (C2)
Hindi
Bilingual or Proficient (C2)

Timeline

System Engineer III

Tesco
04.2025 - Current

Senior DevOps Engineer

Luxoft
06.2022 - 04.2025

Senior Cloud Analyst

Accenture
01.2021 - 06.2022

Senior Software Engineer

Think Future Technologies
10.2020 - 12.2020

Technical Specialist

IBM
07.2016 - 10.2020

B.Tech - Computer Science & Engineering

Galgotia University
Sourav Jha