Summary
Overview
Work History
Education
Skills
Languages
Timeline
Generic

Monika Kumari

Bangalore

Summary

10 Years + Engineering Professional with proven ability to innovate , solve complex technical challenges , thrives in highly automated environments, and enjoys solving deep infrastructure challenges at scale. Known for delivering high-quality solutions and driving team success through effective collaboration and adaptability. Skills include systems analysis, CI CD Pipeline, DevOps ,Cloud Computing, SRE ,Defining SLOs/SLIs for a service ,Observability, SRE KPIs, FinOps ,Developer Productivity and Technical Troubleshooting.

Overview

11
11
years of professional experience

Work History

Senior SRE Lead

Walmart Global Tech
10.2018 - Current
  • Senior SRE Lead responsible to drive transformation across global business entities via standardization and consolidation of Devops Practices, Platform Engineering , SRE practices.
  • Focus areas: Platform Engineering, CI-CD , Cloud Operations, Reliability Engineering,Automation,Cost Optimization,Implementing HA DR.
  • Creating and Managing Cloud Kubernetes Clusters to support the development and deployment of applications
  • Lead the end-to-end Devops Lifecycle for Atlas System,Walmart's Internal Warehouse Management System which serves 800+ Engineers and powers 100+ Microservices
  • Implemented CICD using Jenkins ,Github, Ansible,Artificatory,Helm, provisioning and managing Azure resources
  • Experienced in working with both on-prem Kubernetes and managed Kubernetes services (AKS, EKS) Worked on Migration of on-prem Kubernetes to Edge Kubernetes
  • Designing and implementing deployment strategies for APIs, improving the efficiency and reliability of application delivery
  • Increase Developer Productivity with self-service pipelines and best practice tooling
  • Part of the Centralized Devops Core Team Committee Member that basically define organization level requirements , review designs and implement resilient solution that can be adopted across to avoid efforts duplication
  • Collaborated with developers and application teams on Java-based systems, focusing on understanding requirements and debugging situations effectively
  • Created a real‑time Database Monitoring Tool (DB Inspector) that provides comprehensive insights into database health, including long‑running queries, blocking sessions, availability, storage utilization, active connections, user activity, and other key metrics , enabling faster detection, troubleshooting, and resolution of database performance and availability issues.
  • Led the implementation of Chaos Engineering practices and conducted regular Disaster Recovery (DR) Drills to proactively test system resilience and validate recovery procedures. Developed and executed chaos experiments to simulate failures, identify single points of failure, and improve system reliability, while coordinating DR exercises to ensure effective recovery and continuity across critical services
  • Experience in automating workflows using Python and Shell and Linux Systems. Proven track record of improving system efficiency and reducing manual effort through scripting and tool development
  • Importing the entire infrastructure of Azure into Terraform, managing the creation, deletion, and modification of users, groups,roles, and policies
  • Designing and presenting technical solutions that meet customer requirements, and working with technical teams to develop architecture diagrams, technical specifications, and project plans that describe how the solution will be implemented
  • Ensuring that project deliverables meet quality standards and developing and implementing quality assurance processes and procedures, and monitor project deliverables to ensure that they meet quality standards
  • Identifying opportunities for continuous improvement in software development process , and deliverables; working closely with stakeholders and project team members to implement process improvements and to ensure that lessons learned are captured and applied to future projects
  • Created and deployed microservices in AKS System, improving the scalability and agility of application delivery
  • Proficient in managing and configuring Kafka for high-throughput, fault-tolerant messaging in distributed systems, managing the Kafka DR
  • Monitored Kafka performance and availability using Prometheus and Grafana and Automated Kafka topic creation, ACLs, and configurations using automation scripts
  • Provisioned and configured Azure SQL Databases and Managed Instances for high availability and performance and Configured alerts and dashboards for CPU usage, I/O latency, query performance, and connection health across Azure SQL
  • Experience in designing and implementing high availability (HA) and disaster recovery (DR) solutions in multiregional cloud and on-prem environments.
  • Strong background in minimizing downtime and ensuring data integrity through automation and failover strategies.
  • Managed Kafka SSL and Venafi certificates, and automated the certificate lifecycle management process using Python.
  • Practiced Agile processes and implemented DevOps life cycle, ensuring the seamless integration of development and operations activities across Cloud and Containerization platforms
  • Used Terraform to deploy new environments as Infrastructure as Code (IAC), ensuring the consistency and repeatability of deployment processes
  • Applied security features, such as Network Policies, to protect pods from potential attacks, enhancing system security
  • Have Experience of tools for logging and monitoring, and security scanning (Splunk,OpenObserve, SonarQube)
  • Deployed and maintained Prometheus for metrics collection and alerting across 100+ microservices.
  • Integrated Grafana Dashboards with Prometheus and to visualize real-time database metrics and automate health checks.
  • Built custom exporters and alerts for application and infrastructure-level SLIs using PromQL.
  • Automated alert routing and escalation using ControlTower integrated with Slack and Xmatters.
  • Worked on multi-branch and release pipelines for both PROD and Non-PROD environments, ensuring the consistency and reliability of deployment processes with the best market standards Gating Process
  • Created and maintained comprehensive documentation for process onboarding, production runbooks, and operational guides, streamlining knowledge sharing and reducing ramp‑up time for new hires. Supported team growth by conducting interviews for open roles, facilitating technical assessments, and mentoring new team members to ensure a smooth onboarding and alignment with operational best practices.
  • Developing skills in AI ML Ops to enhance the efficiency and accuracy of Artificial Intelligence and Machine Learning Processes

Associate Consultant

Infosys Ltd.
03.2017 - 10.2018

Project 1: DevOps Implementation at VisaNet
Client: VisaNet | Duration: Sep 2017 – Oct 2018
Role: DevOps Engineer
Highlights:

  • Designed and developed scalable, fault-tolerant microservices using functional programming.
  • Automated CI/CD pipelines using Git, Jenkins, and Ansible.
  • Created and integrated APIs to solve distributed computing challenges.
  • Led component security analysis and collaborated with the security team.
  • Maintained internal documentation and implemented performance monitoring metrics.

Project 2: Microservices High Availability on Kubernetes
Client: Tesla | Duration: Apr 2017 – Sep 2017
Role: DevOps Engineer
Highlights:

  • Containerized applications using Docker and orchestrated with Kubernetes for high availability and scalability.
  • Migrated infrastructure from on-premise to Azure Cloud.
  • Automated deployments using Jenkins, Kubernetes, and Azure CLI.
  • Implemented Azure Services: ACS, Service Fabric, OMS, App Insights, and backup/recovery solutions.
  • Configured Nexus for Docker image storage and performed auto-scaling of Kubernetes Pods.

Project 3: DevOps on IBM Bluemix Cloud
Client: Infosys Internal | Duration: Sep 2017
Role: DevOps Engineer
Highlights:

  • Developed a POC for CI/CD automation using IBM Bluemix Toolchain, Git, Jenkins, and Kubernetes.
  • Independently managed build and deployment pipelines.

Senior Systems Engineer

Cognizant Technologies Solutions
12.2014 - 03.2017

Project Experience – CI/CD Implementation
Clients: Huawei Technologies (Sep 2016 – Feb 2017), Barclays (Apr 2015 – Sep 2016)

Overview:
Led DevOps initiatives to implement Continuous Integration and Continuous Delivery using both on-premise and open-source tools, focusing on automation, quality checks, and best practices.

Key Responsibilities:

  • Set up and managed CI/CD pipelines using CruiseControl, Jenkins, Maven, and Ant.
  • Developed XML scripts for build and test automation.
  • Integrated static and dynamic code analysis tools.
  • Designed and deployed Jenkins master-slave architecture for load balancing.
  • Automated infrastructure and application deployment using Puppet (Apache, Tomcat, MySQL, Nagios, etc.).
  • Built end-to-end deployment pipelines for Java and .NET (TFS) applications.
  • Implemented Jenkins–Jira integration for project tracking and ticketing.
  • Delivered a Puppet-based POC showcasing automated configuration management.

Education

Bachelor of Technology - Computer Science

West Bengal University
06.2014

Skills

  • DevOps & SRE Practices: Deep expertise in DevOps culture, automation, and site reliability engineering with a focus on resilience, scalability, and operational excellence
  • CI/CD Pipelines: Designing and maintaining robust, secure, and scalable CI/CD pipelines using tools like GitHub Actions, Jenkins, and Azure DevOps
  • Cloud Computing: Proven experience across major cloud platforms Azure, AWS,GCP(Understanding) with emphasis on hybrid cloud architecture and migration strategies
  • High Availability & Disaster Recovery: Architecting and implementing HA/DR solutions for mission-critical applications across hybrid and on-prem infrastructures
  • Agile Methodologies: Active contributor to Agile teams (Scrum/Kanban), aligning DevOps initiatives with iterative delivery and continuous improvement cycles
  • Virtualization & Containers: Strong foundation in virtual machines and container technologies Docker
  • Kubernetes Ecosystem: End-to-end Kubernetes experience including deployment, scaling, upgrades, RBAC, Helm, and custom controllers/operators
  • Observability & Performance Engineering: Building end-to-end observability stacks (Prometheus, Grafana, ELK), focusing on metrics, logs, traces, and alerting
  • Security & Compliance: Integrating DevSecOps practices into pipelines with secrets management, image scanning, and policy-as-code(Gatekeeper)
  • Infrastructure as Code (IaC): Designing and managing infrastructure using Terraform ,Helm ,Ansible
  • Automation & Scripting: Proficient in automating operational workflows using Python, Golang and Bash, including custom tooling
  • Database : Strong understanding of relational databases and proficient in writing, executing, and optimizing SQL queries
  • Configuration Management: Hands-on with Ansible
  • Source Control Management (SCM): Mastery of Git, branching strategies
  • Messaging System : Kafka, IBM MQ
  • DataBase System: Azure SQL, MS SQL
  • Mentoring & Knowledge Sharing: Actively mentor junior folks and foster a culture of learning, innovation, and cross-functional collaboration
  • Team & Project Leadership: Experience leading DevOps/SRE teams, driving initiatives from design to production support, ensuring alignment with business goals

Languages

English
Hindi

Timeline

Senior SRE Lead

Walmart Global Tech
10.2018 - Current

Associate Consultant

Infosys Ltd.
03.2017 - 10.2018

Senior Systems Engineer

Cognizant Technologies Solutions
12.2014 - 03.2017

Bachelor of Technology - Computer Science

West Bengal University
Monika Kumari