Work Preference

Summary

Overview

Work History

Education

Skills

Certification

Languages

Accomplishments

Work Availability

Affiliations

Quote

Software

Interests

Websites

Timeline

Sourav Jha

Bangalore,KA

Work Preference

Work Type

Full Time

Location Preference

On-SiteRemoteHybrid

Important To Me

Career advancementWork-life balanceCompany CultureFlexible work hoursPersonal development programsHealthcare benefitsWork from home optionPaid time offPaid sick leaveTeam Building / Company Retreats

Summary

Principal Cloud Platform Engineer and Site Reliability Engineer with 9.7+ years of experience designing, operating, and scaling large-scale, high-availability production systems across Microsoft Azure, Google Cloud Platform, and AWS. Proven expertise in SRE principles (SLO, SLI, SLA), Kubernetes-based platforms, Infrastructure as Code, CI/CD automation, and deep Linux and networking fundamentals.

Strong background in cloud platform engineering, DevOps automation, and reliability ownership, including incident management, root cause analysis, MTTR reduction, and observability. Experienced in building secure, resilient, and compliant cloud architectures using policy-as-code, defense-in-depth security models, and governance frameworks. Demonstrated ability to lead DevOps and SRE initiatives, mentor engineers, and collaborate with cross-functional stakeholders to deliver reliability-first, scalable, and cost-optimized platforms.

Overview

years of professional experience

Certificate

Work History

System Engineer III

Tesco

04.2025 - Current

Serve as Principal Cloud Platform Engineer for Cloud Platform team responsible for architecture, reliability, security, and operations of enterprise-scale Azure cloud platforms supporting global workloads.
Act as both Site Reliability Engineer and Cloud Developer, enabling compute, networking, security, Kubernetes, and AI-enabled cloud services in production environments.
Design, build, and operate Azure-native infrastructure including VMs, VNets, AKS, Load Balancers, Private Endpoints, Virtual WAN (vWAN), and DNS, ensuring high availability, scalability, and resilience.
Own reliability and availability of business-critical, high-traffic cloud platforms, proactively addressing risks before customer impact.
Define and implement SLOs, SLIs, and alerting strategies to monitor latency, error rates, saturation, and availability across distributed systems.
Drive SRE best practices, including incident response, on-call operations, root cause analysis (RCA), post-incident reviews, and long-term reliability improvements.
Implement defense-in-depth security architectures using Azure Policy, Microsoft Defender for Cloud, RBAC, and industry-aligned security benchmarks to reduce vulnerabilities and enforce secure-by-default deployments.
Enforce Policy-as-Code and cloud governance frameworks, ensuring compliance, standardization, and audit readiness across environments.
Manage and support production Kubernetes platforms (AKS), including cluster hardening, secure workload identity, autoscaling, and controlled application rollouts.
Automate infrastructure provisioning and configuration using Terraform and Python, reducing manual operations, configuration drift, and operational risk.
Design and optimize secure cloud networking architectures (private connectivity, load balancing, DNS, segmentation) to support compliant and scalable workloads.
Implement end-to-end observability (metrics, logs, alerts) to improve visibility, reduce MTTR, and enhance operational decision-making.
Collaborate with application teams, security, and platform stakeholders to ensure reliability-first system design and secure application delivery.
Mentor engineers and contribute to runbook, SOPs, architecture documentation, and operational best practices.

Senior DevOps Engineer

Luxoft

06.2022 - 04.2025

Led DevOps and Site Reliability Engineering initiatives for application teams, owning end-to-end cloud infrastructure, CI/CD processes, and production automation for enterprise applications.
Acted as SDE III for a large-scale retail client Tesco, supporting engineering teams with reliable, secure, and scalable cloud platforms.
Designed and built CI/CD pipelines with automated testing, security scans, and infrastructure provisioning, improving deployment reliability and release confidence.
Established monitoring and alerting frameworks to ensure SLA compliance, proactively detecting issues and reducing incident recurrence.
Drove cloud and governance initiatives, aligning infrastructure and delivery pipelines with automation, compliance and security standards.
Guided teams through cloud migration and modernization efforts, ensuring minimal production disruption and improved platform resilience.
Automated infrastructure provisioning and configuration using Infrastructure as Code principles, reducing manual effort and operational risk.
Partnered closely with Product Managers, Developers, Architects, and Security teams to ensure smooth delivery and alignment with business requirements.
Contributed to architecture decisions, cost-optimization strategies, and reliability improvements, balancing performance, security, and efficiency.
Provided hands-on production support and incident resolution, applying SRE principles and root cause analysis to prevent repeat issues.
Worked on-site for approximately two years in Warsaw Poland with Central Europe team, enabling close collaboration with client stakeholders and faster decision-making.

Senior Cloud Analyst

Accenture

01.2021 - 06.2022

Worked as a Senior Cloud Infrastructure Analyst supporting large-scale enterprise production systems for E-Commerce client H&M across public cloud environments.
Engineered and optimized CI/CD pipelines using Azure DevOps, and Ansible, improving deployment speed and reducing release failures.
Supported Kubernetes-based platforms across cloud environments, ensuring high availability, scalability, and performance for containerized workloads.
Established SRE practices including proactive monitoring, alerting, and incident response, ensuring high availability and SLA adherence.
Designed and optimized cloud networking components to improve application availability and performance.
Led cloud migration and modernization initiatives, ensuring minimal disruption to business-critical workloads.
Responded to production incidents, performing root cause analysis (RCA) and implementing preventive solutions to reduce recurrence.
Authored technical documentation, runbook, and operational procedures, improving support efficiency and knowledge sharing.
Collaborated with stakeholders to align infrastructure solutions with business, security, and scalability requirements.

Senior Software Engineer

Think Future Technologies

10.2020 - 12.2020

Designed and implemented end-to-end CI/CD pipelines using Azure DevOps, and Git, enabling automated build, test, and deployment workflows for microservices-based applications.
Containerized applications using Docker, standardizing runtime environments and reducing deployment inconsistencies across environments.
Deployed and supported Kubernetes-based workloads, managing manifests, services, ingress configurations, and rollout strategies.
Automated infrastructure provisioning and configuration using Terraform and shell scripting, improving repeatability and reducing manual setup errors.
Integrated artifact repositories into CI/CD pipelines to manage versioned application builds and dependencies.
Implemented deployment strategies to minimize downtime during releases.
Supported production and non-production environments, troubleshooting deployment failures, container crashes, and pipeline issues.
Implemented basic monitoring and logging to track application health, resource utilization, and deployment success.
Collaborated closely with developers and QA teams to streamline release cycles and resolve environment-related blockers.

Technical Specialist

IBM

07.2016 - 10.2020

Supported Linux-based production systems, performing system administration, monitoring, and troubleshooting activities.
Assisted in server provisioning, configuration, patching, and performance tuning across environments.
Wrote basic shell scripts to automate repetitive operational tasks and improve efficiency.
Supported infrastructure components such as web servers, databases, and middleware services, ensuring uptime and stability.
Participated in incident response and root cause investigations, escalating issues and implementing corrective actions.
Built strong foundational knowledge in operating systems, networking concepts, and system reliability, forming the base for advanced DevOps and SRE roles.

Education

B.Tech - Computer Science & Engineering

Galgotia University

Greater Noida, India

06-2016

Skills

Cloud: Microsoft Azure, Google Cloud Platform (GCP), AWS, VMware
Architecture: Multi-Cloud, Hybrid Cloud, High Availability, Scalability, Disaster Recovery
SRE: SLO, SLI, SLA, Error Budgets, Incident Management, On-Call, RCA, Post-Mortems, MTTR, Capacity Planning, High-Traffic Systems
DevOps & CI/CD: CI/CD Pipeline Design, Jenkins, GitHub Actions, Azure DevOps, Cloud Build, GitOps, Release Engineering
Containers: Kubernetes (AKS, GKE, EKS), Docker, Helm, Kustomize, Microservices, Ingress, Autoscaling, Rolling Deployments, Workload Identity

IaC & Automation: Terraform, Ansible, ARM Templates, Policy-as-Code, Governance Automation, Python, Bash, Shell
Observability: Prometheus, Grafana, Azure Monitor, Cloud Monitoring, Splunk, New Relic, Metrics, Logs, Alerts, Latency Monitoring
Networking & Security: Linux, TCP/IP, DNS, HTTP(S), Load Balancing, VPC/VNet, Hub-Spoke, Private Connectivity, IAM, RBAC, Secrets, Zero Trust, ISO, SOC, GDPR
Leadership: Technical Leadership, Mentoring, Stakeholder Communication, Architecture Reviews, Documentation, Agile, Scrum, DevOps Culture

Certification

AWS Certified Solutions Architect
Microsoft Azure DevOps Solutions Expert
Microsoft Azure Administrator
AWS Cloud Practitioner
ITIL® 4 Foundation

Languages

English

Bilingual or Proficient (C2)

Hindi

Bilingual or Proficient (C2)

Accomplishments

Captain – Computer Science Cricket Team: Led the team in inter-department tournaments, demonstrating leadership and teamwork.
Hospitality & Management Coordinator – BookMyShow (IPL 2014 & 2015): Coordinated large-scale event operations, ensuring smooth execution and stakeholder alignment.
Motorsport Marshal – BookMyShow: Supported high-profile events at Buddh International Circuit (Indian GP, JK Tyre Championship), gaining experience in high-pressure environments.
Image Encryption & Decryption (Blowfish Algorithm): Applied cryptography principles to secure digital data.
Criminal Investigation System: Contributed to a cloud-based project management solution tailored for investigation workflows, showcasing SaaS solution design.

Work Availability

monday

tuesday

wednesday

thursday

friday

saturday

sunday

morning

afternoon

evening

swipe to browse

Affiliations

Strategic problem-solving and analytical mindset
Effective communicator, mentor, and collaborative leader
Results-driven, detail-oriented, and customer-focused
Proactive, adaptable, and innovative in leveraging emerging technologies
Strong stakeholder management with focus on security, compliance, and operational excellence

Quote

Even if you are on the right track, you’ll get run over if you just sit there.

Will Rodgers