Summary

Overview

Work History

Education

Skills

Timeline

SUMIT KUMAR

Bangalore

Summary

Experienced and results-driven Site Reliability Engineering Manager with a proven track record of building and managing highly scalable, secure, and resilient infrastructure. Led cross-functional SRE teams, automated operations, and improved deployment efficiency by 30% through CI/CD modernization. Skilled in AWS, Kubernetes, Terraform, incident management, and system observability. Known for driving reliability through automation, reducing MTTR, and implementing security best practices across cloud platforms.

Overview

years of professional experience

Work History

SRE Manager

Khoros

05.2023 - Current

Manage and mentor a high-performing SRE team of 10 people, driving a culture of ownership, collaboration, and continuous improvement.
Define and uphold SLOs/SLIs to ensure high availability and performance for critical services.
Oversee cloud infrastructure (AWS, EKS) with a focus on scalability, reliability, and automation via Terraform, and ArgoCD.
Streamline incident management processes, resulting in a significant reduction in MTTR.
Observability practices using Datadog and Sumo Logic, optimizing alerting and monitoring strategies.
Enforce infrastructure and application security using mTLS, RBAC, and WAF rules.
Collaborate cross-functionally with Dev, Product, and Support teams for production readiness and smooth deployments.
Lead monthly reviews on cloud spend and implemented DynamoDB cost optimization strategies.

Lead SITE RELIABILITY ENGINEER

Khoros India Pvt Limited

05.2022 - 05.2023

Implemented Web Application Firewall for more than 1000 Load balancers, CDNs, Api gateways in AWS and GCP
Added security features like Shield advanced for Infra residing in AWS cloud
Automated security features deployment using Github
Developed and maintained a mission-critical application designed to mitigate and manage outages effectively
Migrated legacy build pipelines to latest CI/CD Jenkins
Worked on Writing end-to-end Automation Scenarios for many modules
Stabilizing smoke and functional Suite while being SDET engineer
Covered API endpoints using selenium and TestNG for different module in our product

SENIOR SRE

Khoros India pvt Limited

03.2021 - 05.2022

Led a team of SREs responsible for the design, deployment, and maintenance of critical infrastructure components
Conducted incident reviews, root cause analysis and implemented preventive measures to enhance system resilience
Provided technical guidance on the design, implementation, and maintenance of cloud infrastructure
Implemented automation tools to increase efficiency in deployment processes
Monitored systems performance using various metrics such as latency, throughput, availability
Created automated scripts for software deployments and configuration management tasks
Maintained security policies for the organization's cloud services according to industry standards
Documented best practices and procedures for incident response activities

SRE-3

Khoros India pvt limited

10.2019 - 03.2021

Developed and implemented monitoring solutions to improve system reliability
Researched and evaluated new technologies to enhance platform reliability and stability
Optimized existing infrastructure components for cost savings while ensuring compliance requirements
Performed capacity planning activities based on current usage trends and future projections
Conducted monthly progress meetings to inform senior leadership and stakeholders of project advancements

SRE - 2

Khoros

10.2018 - 10.2019

Focused on improving system reliability, automating operations, and scaling cloud infrastructure using AWS, Kubernetes, and Terraform.

SDET II

Lithium Technologies Pvt Ltd

07.2015 - 01.2018

Experience in Automating Web UI Application Testing using Selenium WebDriver and Rest Api with TestNG framework
Technology: JAVA
Tools Used: Selenium Web driver for Ui with Junit, Rest API Testing using Java + Retrofit

Education

Bachelor of Engg. - Information Science

SJCE

Mysore

01.2015

AISSCE(12th) - PCM

MPS

Forbesganj

01.2010

AISSE (10th) - Science

Forbesganj

01.2008

Skills

Infrastructure as Code (IaC): Cloudformation, Terraform
Cloud Platforms : AWS, GCP
Security : Firewall, Shield Advanced, Rate Limiting, DDoS Mitigation
Containerization : Docker, K8s
Monitoring and Logging: DataDog, Sumo Logic, Nagios
Scripting and Programming : Java, Shell, Apache Velocity, AWS Cli

CI/CD : Jenkins, Github Actions
Version Control : Git, Bitbucket, SVN
Networking: DNS , Load Balancing
Incident Management : Pager Duty
Collaboration : Jira, Confluence
Workforce management

Timeline

SRE Manager

Khoros

05.2023 - Current

Lead SITE RELIABILITY ENGINEER

Khoros India Pvt Limited

05.2022 - 05.2023

SENIOR SRE

Khoros India pvt Limited

03.2021 - 05.2022

SRE-3

Khoros India pvt limited

10.2019 - 03.2021

SRE - 2

Khoros

10.2018 - 10.2019

SDET II

Lithium Technologies Pvt Ltd

07.2015 - 01.2018

Bachelor of Engg. - Information Science

SJCE

AISSCE(12th) - PCM

MPS