Summary

Overview

Work History

Education

Skills

Certification

Timeline

Saritha Sreekumar

Bangalore

Summary

18+ years of experience in Engineering, Gen AI, Site Reliability Engineering, Cloud Native Infrastructure & Software Development, DevOps & DevSecOps.
Experienced in Budgeting, Developer Productivity Enhancement, Production Operations, Incident Management, Infrastructure Support & Customer Relations.
Conducted software security audits like SOC-2 and FEDRAMP.
Culture ambassador and servant leader who guides the team based on core values
Excellent stakeholder management and cross team/ cross geo collaboration.
With strong prioritization skills, handled multiple projects simultaneously and have a passion for learning.
Built high-performance teams in multiple high-growth technology companies across different locations.
As a customer advocate, played a key role in fostering a customer-obsessed culture.
Quality advocate focused on building robust platforms and large-scale systems with an emphasis on quality, reliability, scalability, maintainability, and supportability practices.

Overview

years of professional experience

Certification

Work History

Director, Site Reliability Engineering

Precisely

12.2021 - Current

Global Head of Site Reliability Engineering spread across India, US and Canada reporting to VP of SaaS
Team consists of infrastructure developers, AI developers, architects, site reliability engineers and devsecops engineers, responsible for ensuring reliability and stability of the Enterprise SaaS offering hosted by Precisely
Built the Cloud Native Infra for DI Suite & Automate Studio Manager
Budgeting and cost reduction for AWS, Datadog, MongoDB etc
Reduced cloud layer costs and other IT infrastructure costs, resulting in an annual savings of $1 million.
Built a Gen AI based chatbot for enhancing developer productivity and reducing SRE toil
Built Gen AI based incident management and response system to handle production systems
Very strong experience on AWS and Kubernetes, LLM, Mistral AI, pgvector, Terraform, MongoDB, Kafka
Introduced Datadog as a single pane of glass Observability Solution.
Introduced cloud native best practices for software development and released 180+microservices focusing on state of the art Devops and DevSecops practices including AWS, EKS, Docker, Helm, Datadog Concourse, Argo CD, Mongo DB, snowflake, Databricks, Prometheus, Sumologic, Splunk, ELK, AWS, Terraform, Cloudability, Apptio, Prisma Cloud, Temporal cloud
Built dashboards using Tableau, Datadog, Heap and Google Analytics data
Coached and mentored developers from a legacy software development background to adopt cloud native ways of software development by breaking down monoliths to microservices, defining SLIs and SLOs for microservices, building logs and by implementing telemetry and tracing
Introduced Managed services Kafka and built an automation for services to subscribe to Kafka
Introduced Terraform, MongoDB and Prisma cloud(Twistlock), Web Application Firewall(WAF)
Worked with legal for drafting SLA for DI Suite
Implemented a 24x7 oncall process for SRE and setup an Incident Management Process
Built a Disaster Recovery Plan and got the product SOC-2 certified
Now working on FEDRAMP.

Senior Manager, Site Reliability Engineering

QLIK

06.2016 - 12.2021

Developed the Site Reliability Engineering Practice at Qlik based on Google’s SRE pyramid
One of the first adopters ever in the industry
Managed a cross geographic Site Reliability Engineering team of 40 (APAC/EMEA and NA) and reporting into the Sr Director, Site Reliability Engineering at Qlik
Culture ambassador, who drives Qlik’s Core Values by example in my actions and interactions with the customer and within the organisation
Drove the Developer Productivity Initiative at Qlik
Achieved SOC-2 and FEDRAMP moderate certification for the product
Cost management initiative for all infra components within R&D
Budgeting, including forecasting, contract negotiation, cost optimization and cost control for tooling and cloud platform
Designing, developing, and implementing tools and automation needed for keeping the lights on for Qlik’s Enterprise SaaS offering
Pillars of focus: Observability, Scalability, Security, Cost, Reliability & Performance
Envisioned, designed and implemented an Incident Management process by using PagerDuty, Alert manager and Prometheus
Strategized, Developed & Implemented 24x7 Oncall for the first time in the history of Qlik R&D
Drove launch coordination including building and maintaining CI/CD pipelines, artifactory, designing and implementing the metrics stack for microservices coming into stage and production
Instrumental in building a customer obsessed culture within R&D at Qlik by building cross functional gaps between Support and R&D
Contributed significantly towards building a Customer First team
Gathered insights around customer usage metrics of the product and working in collaboration with product management advocating customer needs around the product functionalities
Hands on experience building Qlik Sense Apps for measuring KPIs and customer usage
Support Lifecycle Management: Working with customers and customer facing teams for escalated cases from product support, identifying bugs and defects, working closely with developers for fixing the same and thus responsible for the complete lifecycle management of support cases (Tools Used: ServiceNow, JIRA, Salesforce)
Built a SaaS Bug Triage Workflow for escalating bugs from Support to R&D at senior management level
Integrated various tools used with R&D and outside (Jira/Salesforce/ServiceNow) and built various workflows
Automation using bash and Python
Vendor management for new and existing tools (eg: Sumologic, Cloudwiry, Twistlock, Jfrog)
Hiring, Mentoring and Coaching- Performance Management.

Line Manager / Deputy Manager

Volvo IT

12.2013 - 06.2016

Managed a team of 30 people (VMware/OSD/SBC)
Drove Monthly KPIs: Reduction of transfer rate and reduction of lead times, Weekly SLA trend
Drove towards Continuous Improvement as part of VPS4IT, CATS compliance
Architected new VMware solutions and was SPOC for all Global VMware escalated Issues
Reduced incidents through automation and drove VPS4IT for the team (Lean Management system)
Hiring, Coaching & Mentoring of team members
Drove team meetings including huddles, problem solving sessions, service review meetings
Documentation: - BPDs, Run book, TCRP documents, Process flow documents
Adherence to ITIL: Incident, Change, Problem, Capacity, Configuration & Knowledge Management
Continuous Improvement in Daily Operations and Delivery Excellence
Budgeting, Negotiation and Vendor Coordination
Learning Initiatives for the team: Imparted and Organized various external trainings for the team
Disaster Recovery for VMware Infrastructure
Cross Team/ Cross Geo Collaboration and Stakeholder Management
Skill Matrix Management, Capacity planning using skill matrix
24x7 operations management and shift roster, Shift Allowance Calculation.

Senior Engineer - System Administration

Computer Sciences Corporation

03.2010 - 12.2013

Managed multiple VMware & Windows projects for various clients in the UK region as an SME
VMware Architect and Subject Matter Expert - VSphere 4.x, 5.x, and VMware VI3 Infrastructure
VMware Trainer for infrastructure teams within CSC
Prepared and maintained project milestones, schedules, action items and issues for VMware solutions
Prepared and Hosted Daily Service Reviews & SLA reviews, KPI, FTE & SLA reports on monthly basis
Performed Capacity & Trend Analysis of ESX farms
Risk Register Management
Worked with internal auditing teams and participated in external audits
Escalation support and Problem Management with RCAs for VMware technical issues
Technical panelist for interviews on VMware & Windows
Mentored other team members on VMware related tasks.

Module Lead – Remote Infrastructure Management

Wipro Technologies

10.2005 - 03.2010

Managed a 4 member Wipro Offshore Team for a Telecom Client
Service delivery reporting and Review with Management
Quality adherence and Audits: Internal and External, process documentation
Identified Non-Compliance incidents and prepared risk management documentation
Installed, configured and managed VI3 infrastructure by working as a Server Build Coordinator & Senior Engineer globally.

Education

Executive General Management Program -

Indian Institute of Management,Bangalore

2021

B-Tech in Electronics & Communication -

CUSAT

2005

Senior Secondary -

Mahatma Gandhi University, Kerala

2001

SSLC -

Board of Public Examinations, Kerala

1999

Skills

People Management
SRE, Devops, Devsecops
Stakeholder Management
Gen AI based development, LLM,Mistral AI
Automation, Toil Reduction
Cloud Native Development

AWS, Kubernetes, MongoDB
FEDRAMP & SOC-2
Budget and Vendor Management
Process Development and Collaboration
Hiring , Coaching and Performance Management

Certification

Certified Scrum Master
ITIL V3 Intermediate on Service Operations
Continuous Service Improvement
EMC Certified Associate- Cloud Infrastructure and Services
PMP ,Six Sigma Yellow belt

Timeline

Director, Site Reliability Engineering

Precisely

12.2021 - Current

Senior Manager, Site Reliability Engineering

QLIK

06.2016 - 12.2021

Line Manager / Deputy Manager

Volvo IT

12.2013 - 06.2016

Senior Engineer - System Administration

Computer Sciences Corporation

03.2010 - 12.2013

Module Lead – Remote Infrastructure Management

Wipro Technologies

10.2005 - 03.2010

Executive General Management Program -

Indian Institute of Management,Bangalore

B-Tech in Electronics & Communication -

CUSAT

Senior Secondary -

Mahatma Gandhi University, Kerala

SSLC -

Board of Public Examinations, Kerala

Saritha Sreekumar

Summary

Overview

Work History

Director, Site Reliability Engineering

Senior Manager, Site Reliability Engineering

Line Manager / Deputy Manager

Senior Engineer - System Administration

Module Lead – Remote Infrastructure Management

Education

Executive General Management Program -

B-Tech in Electronics & Communication -

Senior Secondary -

SSLC -

Skills

Certification

Timeline

Director, Site Reliability Engineering

Senior Manager, Site Reliability Engineering

Line Manager / Deputy Manager

Senior Engineer - System Administration

Module Lead – Remote Infrastructure Management

Executive General Management Program -

B-Tech in Electronics & Communication -

Senior Secondary -

SSLC -

Similar Profiles

Ian KachemovIan Kachemov

Ashis GuruAshis Guru

Vincent LyonsVincent Lyons

David M. RothDavid M. Roth