Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sandeep Singh

Lead Site Reliability Engineer
Bengaluru

Summary

Highly accomplished and forward-thinking DevOps Engineer with a solid 10-year track record in harnessing the power of Kubernetes and driving cloud-native transformations. Adept at designing and orchestrating scalable infrastructures, leveraging Kubernetes as the cornerstone of container orchestration. Proven expertise in automating seamless deployments, implementing robust CI/CD pipelines, and harnessing the full potential of Helm and Docker for accelerated application delivery. Proficient in monitoring and optimizing Kubernetes clusters using cutting-edge tools such as Prometheus and Grafana.

Overview

13
13
years of professional experience
4
4
years of post-secondary education
2
2
Certifications

Work History

Lead Site Reliability Engineer

Grab Greco LLP
06.2024 - Current

Managing a multi-tenant platform for Risk Protection, Fraud, Passenger and Driver Safety, and KYC Suite for Grab, while spearheading B2B initiatives through GrabDefence.

  • Oversee and optimize EKS-based infrastructure using Terraform for provisioning and Datadog for comprehensive monitoring.
  • Implement and uphold SRE methodologies to ensure system reliability and peak performance.
  • Build an ArgoCD cluster from the ground up across multiple regions and lead the initiative to migrate microservices workloads from Jenkins to a GitOps workflow.
  • Design and execute stress tests to identify and mitigate bottlenecks during peak traffic, utilizing Locust to simulate API traffic.

Senior DevOps Engineer

Walmart Global Technology Services
6 2022 - 06.2024
  • Responsible for building and managing the Walmart Cloud Native Platform (WCNP) product for all
    containerised workloads serving Walmart’s application teams across global Ecommerce platforms &
    Stores/Distribution and Fulfilment Centers with Kubernetes as the backbone.
  • Product Evangelist often presenting the value proposition of WCNP to customers, conducting best practices workshops and participating in consulting & troubleshooting sessions with customers .
  • Hands on experience across the WCNP ecosystem working with Single Tenant, Cluster Automation & Reliability, and Multi Cluster Orchestration squads .
  • Provision and manage AKS and GKE clusters using Terraform, ensuring the infrastructure is set up according to the desired specifications and configurations.
  • Engineering, optimization and sustenance of cloud infrastructure with Kubernetes backbone to support scalable and highly available applications platform.
  • Focused on integrating Walmart enterprise ecosystem around cloud native platform in Azure.
  • Expanded WCNP footprint with Canada/UK/China clusters roll-out for general availability .
  • Open Source adoption , Incorporated k8s sig project cluster proportional autoscaler in Walmart ecosystem achieving 10k USD/day cost saving https://github.com/kubernetes-sigs/cluster-proportional-autoscaler .
  • Worked on role bindings and to harden the security of the infrastructure control plane system.
  • Designed and implemented backup retention and recovery for Kubernetes CRDs with Velero with azure blob storage.

Customer success :


  • Lead consultant for application migration from VMs to container based microservices in IDC.
  • Lead the pilot program to work with applications teams to move to Walmart cloud native platform for IDC as Consulting as a Patterns, experiments & IDC On-boarding consultant.

SDE2 (Devops Lead)

Maveric Systems Pvt Ltd.
08.2021 - 05.2022
  • Transform Customer journey to the cloud by decreasing on-premises footprint.
  • Managed and supported container infrastructure for banking clients
  • Test/develop/build/release cloud platform capabilities .
  • Designed Monitoring and Alerting system for Developer Platform to enable granule behavior of deployments in DEV/STAGE/PROD env.
  • Create and manage Kubernetes resources, such as pods, services, and deployments, within the AKS and GKE clusters using Terraform, ensuring the applications are deployed and running smoothly.

Senior Associate Consultant Devops

Infosys Pvt Ltd
04.2018 - 03.2021
  • Assists production SRE team during incidents and outages with investigation of k8s stack / node failures
  • Production support and Maintenance of Dev-ops tools that includes Jenkins, GIT, Bitbucket, Open-Source Jenkins, Kubernetes, Open-Shift, Cloud bees Jenkins and security analysis tools.
  • Configuring, managing a supporting multiple build/SCM tools with Jenkins/Cloud bees which includes GIT, TFVC, Jazz repo, Maven, Make, Ant.
  • Worked in scaling on-prem Master/Slave set-up in containers using OpenShift containers platform. It also includes Kubernetes plugin configuration on all master’s nodes
  • Integrated monitoring tools AppDynamics in Kubernetes cluster, to active monitor pods Health.

Senior Associate Technology Devops

NAGARRO SOFTWARE PVT. LTD
10.2015 - 03.2018
  • Created Jenkins CICD pipelines for continuous build & deployment and integrated Junit and SonarQube plugins in Jenkins for automated testing and for Code quality check .
  • Configured Git with Jenkins and schedule jobs using POLL SCM .
  • Implemented docker-maven-plugin in maven pom to build docker images for all microservices and later used Docker file to build the docker images from the java jar files.
  • Created Docker Images using a Docker file. Worked on Docker container snapshots, removing images, and managing Docker volumes.
  • Wrote several Ansible playbooks for the automation that was defined through tasks using YAML format and run Ansible Scripts to provision Dev servers.

VAS Engineer (Build and Release)

REALNETWORKS PRIVATE LIMITED
03.2015 - 10.2015
  • Maintained Devops CICD infrastructure including Artifactory, Jenkins.
  • Created all scheduler and data back-up jobs using Python/Bash/Cron/SCP/SSH/FTP
  • Worked on JIRA Administration including user management, workflow & field creation.
  • Installed MAVEN and configured pom.xml in conventional projects for continuous integration.
  • Automated deployment of software and provisioning of Linux hosts using Ansible.

VAS Engineer Support (Linux Admin)

Comviva Technologies
05.2011 - 10.2012
  • Responsible for installation, configuration, management and maintenance over Linux systems.
  • Configuration of LVM and managing the volumes, Configuration of Virtualization.
  • Monitoring server using nagios
  • Configuration of FIP and remote applications like ssh, scp, telnet
  • Diagnosing user related issues and providing solutions to them: Backup & Restoration by using tar
  • Diagnosing, monitoring performance and network related issues using ps, top, netstat.

Education

B.Tech - Electronics And Instrumentation

Kurukshetra University
Kurukshetra
06.2005 - 06.2009

Skills

    Cloud Technologies: Azure (Compute, Databases, Kubernetes Engine & Functions, DevOps Services(Code,Build,Deploy), Networking, Security, Storage), Google Cloud Platform (VPC, Payment Card Industry - PCI hosting), Private Cloud: Openstack IaaS , Multi-tenant application hosting AWS (EKS)

    Scripting: Shell,Python

    Monitoring & APM: AppDynamics, Prometheus, ELK, Dynatrace

    Infrastructure as Code (IaC) : Terraform

    GitOps: ArgoCD

Certification

CKA: Certified Kubernetes Administrator

Timeline

Lead Site Reliability Engineer

Grab Greco LLP
06.2024 - Current

SDE2 (Devops Lead)

Maveric Systems Pvt Ltd.
08.2021 - 05.2022

CKA: Certified Kubernetes Administrator

06-2021

AZ-900 Microsoft Certified: Azure Fundamentals

07-2020

Senior Associate Consultant Devops

Infosys Pvt Ltd
04.2018 - 03.2021

Senior Associate Technology Devops

NAGARRO SOFTWARE PVT. LTD
10.2015 - 03.2018

VAS Engineer (Build and Release)

REALNETWORKS PRIVATE LIMITED
03.2015 - 10.2015

VAS Engineer Support (Linux Admin)

Comviva Technologies
05.2011 - 10.2012

B.Tech - Electronics And Instrumentation

Kurukshetra University
06.2005 - 06.2009

Senior DevOps Engineer

Walmart Global Technology Services
6 2022 - 06.2024
Sandeep SinghLead Site Reliability Engineer