Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Timeline
Generic
Chandrapal Panwar

Chandrapal Panwar

Gurugram

Summary

Reliability & MLOps Engineer with expertise in designing and managing high-scale SaaS platforms on Amazon EKS. Skilled in infrastructure automation using Terraform, Python, Ansible, and Go to ensure high availability and security. Proven track record of optimizing CI/CD pipelines and implementing advanced deployment strategies, reducing deployment incidents by 70%. Experienced in applying AIOps techniques with Prometheus, Grafana, and Loki to enhance ML-based noise reduction and decrease Mean Time to Recovery (MTTR).

Overview

16
16
years of professional experience
1
1
Certification

Work History

Reliability Engineer IV

Avalara
Remote
07.2025 - Current
  • Administered observability stacks including Grafana, Prometheus, and Loki, integrating with OpenTelemetry and implementing AI/ML-based anomaly detection for real-time system monitoring.
  • Architected and containerized applications using Docker, Spring Boot, Node.js, and PHP, leveraging Helm charts for streamlined EKS deployment and lifecycle management across development and production environments.
  • Spearheaded cloud application migration initiatives, automating the provisioning of AWS services like VPC, EKS, RDS, DynamoDB, and DocumentDB using Terraform, Ansible, and CloudFormation, significantly reducing manual intervention.
  • Developed a Python utility leveraging Okta, Vault, and AWS APIs to programmatically generate and inject ephemeral session tokens, significantly enhancing security posture and pipeline efficiency.
  • Deployed and managed application workloads onto Azure Kubernetes Service (AKS), optimizing configuration for high availability and scalability.

Sr SRE Engineer

Blink Charging
Noida
07.2023 - 07.2025
  • Led disaster recovery planning and testing, minimizing downtime and data loss during unforeseen incidents.
  • Implemented advanced deployment strategies (blue-green and canary) to improve release reliability and reduce deployment-related incidents by 70%.
  • Proficient in Dockerizing and containerizing applications, with deep expertise in Helm chart development for deploying software on Amazon EKS.
  • Spearheaded automation of deployment strategies, achieving 99.9% uptime and enhancing Kubernetes cluster resilience and scalability across multi-cloud environments.
  • Defined SLOs and SLIs with development teams, utilizing ML-based noise reduction to improve alert signal quality and reduce false positives.
  • Automated incident response workflows with AIOps, significantly improving MTTR and reducing manual troubleshooting through AI-assisted diagnostics.
  • Leveraged AWS Bedrock with Guardrails for secure and compliant LLM deployment, implementing contextual grounding and automated reasoning checks to reduce hallucinations and enhance performance observability.
  • Built self-healing infrastructure using Go, Python, and Terraform, driving operational efficiency and reducing on-call load.
  • Designed and provisioned scalable AWS infrastructure (EKS, VPC, DynamoDB, DocumentDB, RDS), automating IAM policy and role management via Terraform, Ansible, and Python.

Sr DevOps Engineer

intive
Dublin
04.2023 - 06.2023
  • Provided technical definition in the containerization of applications leveraging OpenShift and Kubernetes.
  • Managed Docker containers and Kubernetes clusters on the OpenShift platform.
  • Worked with Red Hat OpenShift Container Platform for Docker and Kubernetes management.
  • Managed central repositories and implemented Git to host central repositories for source code across products.
  • Used Docker and Kubernetes to manage, scale and update containers for improved automation possibilities.

Technology Architect

McKinsey & Company
Gurugram
09.2015 - 04.2023
  • Leveraged ArgoCD for declarative, GitOps-driven continuous delivery, automating application deployments and enforcing infrastructure as code (IaC) principles. This reduced deployment errors by 30% and improved deployment speed by 40%, ensuring consistency and scalability across multiple environments.
  • Collaborated in a Scrum team using Agile methodologies to deliver projects on time, improving team productivity by 15%.
  • Configured and maintained Bitbucket Pipelines for automated CI/CD, integrating unit tests, static code analysis, and deployment scripts to enhance the software delivery pipeline.
  • Integrated Amazon Redshift with SageMaker to enable seamless data retrieval and storage, optimizing the analysis of datasets exceeding 10 TB.
  • Automated data transfer from SageMaker machine learning models into Redshift via Amazon S3, reducing data processing costs by 25% and improving scalability for large-scale data analysis.
  • Implemented end-to-end monitoring, logging, and alerting in Docker production environments using Grafana and Grafana OnCall, improving system visibility and reducing incident response time by 40%, resulting in enhanced operational efficiency and uptime.
  • Managed Kubernetes infrastructure and its ecosystem, including Nginx and Velero, improving system resilience and backup processes by 20%.
  • Configured and managed AWS IAM roles and permissions, securing access for over 50 users.
  • Administered AWS services such as EC2, VPC, S3, ELB, Route 53, CloudTrail, and Trusted Advisor, enhancing cloud infrastructure efficiency by 30%.
  • Designed and configured AWS network architecture, including VPCs, subnets, internet gateways, NAT, and route tables, reducing network latency by 15% and improving overall system performance.
  • Managed Helm charts for Kubernetes addons and performed scheduled upgrades, maintaining a secure container ecosystem and reducing security vulnerabilities by 20%.
  • Guided development teams on Kubernetes application deployments, enhancing deployment success rates by 30%.
  • Implemented AquaSec for comprehensive security scanning across Kubernetes infrastructure, identifying and remediating container vulnerabilities, which reduced security risks by 40% and ensured compliance with industry standards.

System Engineer

Tata Consultancy Services
Pune
12.2013 - 09.2015
  • Implemented data security guidelines with auditing, LDAP, and reporting alerts in a highly data-critical environment, utilizing Python, Shell, and Splunk, which improved data security compliance by 35% and reduced incident response time by 50%.
  • Collaborated with the vulnerability management team to proactively address high-risk issues, resulting in a 30% reduction in identified vulnerabilities and enhancing overall system security posture.
  • Automated PostgreSQL and MySQL installations and DML changes using GOCD pipelines and Ansible, reducing deployment time by 40%.
  • Implemented automated failover processes with shell scripts, minimizing downtime by 50% and enhancing overall system reliability.
  • Automated backup, Oracle refresh, and performance processes using shell scripts, enhancing database server robustness and reducing backup time by 35%.
  • Streamlined cold backup processes to manage database size and initiate backups, resulting in a 30% increase in overall operational efficiency.

Infrastructure Engineer

Barclays
Pune
04.2013 - 11.2013
  • Gathered and analyzed information to develop data-driven business solutions for MySQL and Oracle projects, resulting in a 25% improvement in project delivery times and enhancing decision-making processes for stakeholders.
  • Designed and built a 3-node InnoDB cluster with a load balancer using Keepalived as a virtual router for MySQL, achieving 99.99% availability and improving database performance by 40% through enhanced load distribution.
  • Configured and scheduled MySQL Enterprise Monitor to reduce downtime and maximize resource utilization, proactively avoiding production issues, which led to a 30% decrease in downtime and a 25% improvement in resource efficiency.
  • Executed database compression for both SAP and non-SAP databases, successfully reducing storage requirements by 3 TB, which improved data management efficiency and lowered operational costs by 20%.
  • Resolved Oracle 11g RAC issues, preventing potential production downtime and improving client efficiency by 30%, thereby ensuring uninterrupted service delivery and enhanced system performance.

Software Engineer

Mphasis
Pune
01.2010 - 04.2013
  • Installed, configured, and maintained a secure Business Objects (BO) environment while creating ad hoc reporting-ready universes using Business Objects Universe Designer, which improved reporting efficiency by 35% and enhanced data accessibility for over 100 users.
  • Automated MySQL database migration processes using Python scripts, reducing migration time by 50% and minimising downtime during transitions, which improved overall database management efficiency.
  • Administered Oracle 11g RAC environments and conducted database cloning on ASM and non-ASM databases, improving database deployment speed by 40% and enhancing disaster recovery capabilities, ensuring minimal downtime during critical operations.
  • Analysed existing universes and databases to enhance and create new universes, improving performance times for multiple Business Objects (BO) and Cognos reports, which reduced report refresh times by 40% and increased overall reporting efficiency.

Education

Post Graduate Diploma - Data Science

International Institute of Information Technology Bangalore

Bachelor of Engg - Computer Science

G.B. Pant Engineering College

Skills

  • Elastic and Azure Kubernetes services
  • AWS DevOps
  • Container orchestration
  • Infrastructure automation
  • Cloud migration
  • Managed containerization for Nodejs, PHP, and Spring Boot
  • Source code management
  • Data analysis and reporting
  • Relational database services
  • Workflow design and implementation
  • Performance monitoring
  • Continuous integration

Certification

• CKA: Certified Kubernetes Administrator, The Linux Foundation, LF-q7oxb6bi2m
• AWS Certified Solution Architect - Associate, Amazon, CST4R4QB3FB4QVKK

Accomplishments

  • Zero Defect Award, McKinsey & Company, 12/01/16, Recognized for automating backup and MySQL processes.
  • Monthly Submit Award, Mphasis, 12/01/12, Recognized for enhancing performance in Business Objects and Cognos.
  • Training Topper, Mphasis Learning and Leadership Academy, 02/01/10, Achieved top honors in training.

Timeline

Reliability Engineer IV

Avalara
07.2025 - Current

Sr SRE Engineer

Blink Charging
07.2023 - 07.2025

Sr DevOps Engineer

intive
04.2023 - 06.2023

Technology Architect

McKinsey & Company
09.2015 - 04.2023

System Engineer

Tata Consultancy Services
12.2013 - 09.2015

Infrastructure Engineer

Barclays
04.2013 - 11.2013

Software Engineer

Mphasis
01.2010 - 04.2013

Post Graduate Diploma - Data Science

International Institute of Information Technology Bangalore

Bachelor of Engg - Computer Science

G.B. Pant Engineering College
Chandrapal Panwar