Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Saritha Sreekumar

Bangalore

Summary

  • 18+ years of experience in Engineering, Gen AI, Site Reliability Engineering, Cloud Native Infrastructure & Software Development, DevOps & DevSecOps.
  • Experienced in Budgeting, Developer Productivity Enhancement, Production Operations, Incident Management, Infrastructure Support & Customer Relations.
  • Conducted software security audits like SOC-2 and FEDRAMP.
  • Culture ambassador and servant leader who guides the team based on core values
  • Excellent stakeholder management and cross team/ cross geo collaboration.
  • With strong prioritization skills, handled multiple projects simultaneously and have a passion for learning.
  • Built high-performance teams in multiple high-growth technology companies across different locations.
  • As a customer advocate, played a key role in fostering a customer-obsessed culture.
  • Quality advocate focused on building robust platforms and large-scale systems with an emphasis on quality, reliability, scalability, maintainability, and supportability practices.

Overview

19
19
years of professional experience
1
1
Certification

Work History

Director, Site Reliability Engineering

Precisely
12.2021 - Current
  • Global Head of Site Reliability Engineering spread across India, US and Canada reporting to VP of SaaS
  • Team consists of infrastructure developers, AI developers, architects, site reliability engineers and devsecops engineers, responsible for ensuring reliability and stability of the Enterprise SaaS offering hosted by Precisely
  • Built the Cloud Native Infra for DI Suite & Automate Studio Manager
  • Budgeting and cost reduction for AWS, Datadog, MongoDB etc
  • Reduced cloud layer costs and other IT infrastructure costs, resulting in an annual savings of $1 million.
  • Built a Gen AI based chatbot for enhancing developer productivity and reducing SRE toil
  • Built Gen AI based incident management and response system to handle production systems
  • Very strong experience on AWS and Kubernetes, LLM, Mistral AI, pgvector, Terraform, MongoDB, Kafka
  • Introduced Datadog as a single pane of glass Observability Solution.
  • Introduced cloud native best practices for software development and released 180+microservices focusing on state of the art Devops and DevSecops practices including AWS, EKS, Docker, Helm, Datadog Concourse, Argo CD, Mongo DB, snowflake, Databricks, Prometheus, Sumologic, Splunk, ELK, AWS, Terraform, Cloudability, Apptio, Prisma Cloud, Temporal cloud
  • Built dashboards using Tableau, Datadog, Heap and Google Analytics data
  • Coached and mentored developers from a legacy software development background to adopt cloud native ways of software development by breaking down monoliths to microservices, defining SLIs and SLOs for microservices, building logs and by implementing telemetry and tracing
  • Introduced Managed services Kafka and built an automation for services to subscribe to Kafka
  • Introduced Terraform, MongoDB and Prisma cloud(Twistlock), Web Application Firewall(WAF)
  • Worked with legal for drafting SLA for DI Suite
  • Implemented a 24x7 oncall process for SRE and setup an Incident Management Process
  • Built a Disaster Recovery Plan and got the product SOC-2 certified
  • Now working on FEDRAMP.

Senior Manager, Site Reliability Engineering

QLIK
06.2016 - 12.2021
  • Developed the Site Reliability Engineering Practice at Qlik based on Google’s SRE pyramid
  • One of the first adopters ever in the industry
  • Managed a cross geographic Site Reliability Engineering team of 40 (APAC/EMEA and NA) and reporting into the Sr Director, Site Reliability Engineering at Qlik
  • Culture ambassador, who drives Qlik’s Core Values by example in my actions and interactions with the customer and within the organisation
  • Drove the Developer Productivity Initiative at Qlik
  • Achieved SOC-2 and FEDRAMP moderate certification for the product
  • Cost management initiative for all infra components within R&D
  • Budgeting, including forecasting, contract negotiation, cost optimization and cost control for tooling and cloud platform
  • Designing, developing, and implementing tools and automation needed for keeping the lights on for Qlik’s Enterprise SaaS offering
  • Pillars of focus: Observability, Scalability, Security, Cost, Reliability & Performance
  • Envisioned, designed and implemented an Incident Management process by using PagerDuty, Alert manager and Prometheus
  • Strategized, Developed & Implemented 24x7 Oncall for the first time in the history of Qlik R&D
  • Drove launch coordination including building and maintaining CI/CD pipelines, artifactory, designing and implementing the metrics stack for microservices coming into stage and production
  • Instrumental in building a customer obsessed culture within R&D at Qlik by building cross functional gaps between Support and R&D
  • Contributed significantly towards building a Customer First team
  • Gathered insights around customer usage metrics of the product and working in collaboration with product management advocating customer needs around the product functionalities
  • Hands on experience building Qlik Sense Apps for measuring KPIs and customer usage
  • Support Lifecycle Management: Working with customers and customer facing teams for escalated cases from product support, identifying bugs and defects, working closely with developers for fixing the same and thus responsible for the complete lifecycle management of support cases (Tools Used: ServiceNow, JIRA, Salesforce)
  • Built a SaaS Bug Triage Workflow for escalating bugs from Support to R&D at senior management level
  • Integrated various tools used with R&D and outside (Jira/Salesforce/ServiceNow) and built various workflows
  • Automation using bash and Python
  • Vendor management for new and existing tools (eg: Sumologic, Cloudwiry, Twistlock, Jfrog)
  • Hiring, Mentoring and Coaching- Performance Management.

Line Manager / Deputy Manager

Volvo IT
12.2013 - 06.2016
  • Managed a team of 30 people (VMware/OSD/SBC)
  • Drove Monthly KPIs: Reduction of transfer rate and reduction of lead times, Weekly SLA trend
  • Drove towards Continuous Improvement as part of VPS4IT, CATS compliance
  • Architected new VMware solutions and was SPOC for all Global VMware escalated Issues
  • Reduced incidents through automation and drove VPS4IT for the team (Lean Management system)
  • Hiring, Coaching & Mentoring of team members
  • Drove team meetings including huddles, problem solving sessions, service review meetings
  • Documentation: - BPDs, Run book, TCRP documents, Process flow documents
  • Adherence to ITIL: Incident, Change, Problem, Capacity, Configuration & Knowledge Management
  • Continuous Improvement in Daily Operations and Delivery Excellence
  • Budgeting, Negotiation and Vendor Coordination
  • Learning Initiatives for the team: Imparted and Organized various external trainings for the team
  • Disaster Recovery for VMware Infrastructure
  • Cross Team/ Cross Geo Collaboration and Stakeholder Management
  • Skill Matrix Management, Capacity planning using skill matrix
  • 24x7 operations management and shift roster, Shift Allowance Calculation.

Senior Engineer - System Administration

Computer Sciences Corporation
03.2010 - 12.2013
  • Managed multiple VMware & Windows projects for various clients in the UK region as an SME
  • VMware Architect and Subject Matter Expert - VSphere 4.x, 5.x, and VMware VI3 Infrastructure
  • VMware Trainer for infrastructure teams within CSC
  • Prepared and maintained project milestones, schedules, action items and issues for VMware solutions
  • Prepared and Hosted Daily Service Reviews & SLA reviews, KPI, FTE & SLA reports on monthly basis
  • Performed Capacity & Trend Analysis of ESX farms
  • Risk Register Management
  • Worked with internal auditing teams and participated in external audits
  • Escalation support and Problem Management with RCAs for VMware technical issues
  • Technical panelist for interviews on VMware & Windows
  • Mentored other team members on VMware related tasks.

Module Lead – Remote Infrastructure Management

Wipro Technologies
10.2005 - 03.2010
  • Managed a 4 member Wipro Offshore Team for a Telecom Client
  • Service delivery reporting and Review with Management
  • Quality adherence and Audits: Internal and External, process documentation
  • Identified Non-Compliance incidents and prepared risk management documentation
  • Installed, configured and managed VI3 infrastructure by working as a Server Build Coordinator & Senior Engineer globally.

Education

Executive General Management Program -

Indian Institute of Management,Bangalore
2021

B-Tech in Electronics & Communication -

CUSAT
2005

Senior Secondary -

Mahatma Gandhi University, Kerala
2001

SSLC -

Board of Public Examinations, Kerala
1999

Skills

  • People Management
  • SRE, Devops, Devsecops
  • Stakeholder Management
  • Gen AI based development, LLM,Mistral AI
  • Automation, Toil Reduction
  • Cloud Native Development
  • AWS, Kubernetes, MongoDB
  • FEDRAMP & SOC-2
  • Budget and Vendor Management
  • Process Development and Collaboration
  • Hiring , Coaching and Performance Management

Certification

  • Certified Scrum Master
  • ITIL V3 Intermediate on Service Operations
  • Continuous Service Improvement
  • EMC Certified Associate- Cloud Infrastructure and Services
  • PMP ,Six Sigma Yellow belt

Timeline

Director, Site Reliability Engineering

Precisely
12.2021 - Current

Senior Manager, Site Reliability Engineering

QLIK
06.2016 - 12.2021

Line Manager / Deputy Manager

Volvo IT
12.2013 - 06.2016

Senior Engineer - System Administration

Computer Sciences Corporation
03.2010 - 12.2013

Module Lead – Remote Infrastructure Management

Wipro Technologies
10.2005 - 03.2010

Executive General Management Program -

Indian Institute of Management,Bangalore

B-Tech in Electronics & Communication -

CUSAT

Senior Secondary -

Mahatma Gandhi University, Kerala

SSLC -

Board of Public Examinations, Kerala
Saritha Sreekumar