Summary
Overview
Work History
Education
Skills
Accomplishments
Recent Projects
Timeline
background-images
Akshay Nimbalkar

Akshay Nimbalkar

Pune,MH

Summary

DevOps Engineer | Automation Engineer with 5+ years of experience in automating, and securing cloud-native infrastructure at scale. Have good contributions in Infrastructure as Code (shell, python, Terraform ), container orchestration (On Premise Kubernetes, OpenShift, KVM), and observability platforms (Prometheus, Grafana). Initiated cloud security initiatives by migrating Kubernetes authentication to AWS IRSA and enforcing least-privilege IAM policies. Built AI-driven internal tools including a Slack bot for automated alert triage, reducing manual log analysis by 70%. Some experience in multi-cloud strategy (AWS, Azure) and full-stack automation from CI/CD pipelines to production monitoring.

Overview

6
6
years of professional experience

Work History

Senior SRE/DevOps Engineer

DeepIntent
12.2024 - Current
  • Led deployment automation to streamline application releases across multiple environments.
  • Optimized cloud infrastructure on AWS, enhancing performance and reducing costs through resource management.
  • Collaborated with development teams to troubleshoot and resolve deployment issues in real-time.


Cloud & Kubernetes Security & Operations:

  • Designed and implemented AWS IAM Roles for Service Accounts (IRSA) for Kubernetes, migrating from vulnerable IAM key-based authentication to secure, role-based access, enhancing security compliance by 95% and eliminating credential exposure risks
  • Managed and optimized AWS RDS instances with a focus on high availability, automated backups, performance tuning, and security hardening, improving database uptime
  • Managed Apache Airflow infrastructure on Kubernetes for workflow orchestration, including DAG deployment, executor scaling, and monitoring integration
  • Led production Kubernetes troubleshooting, diagnosing cluster, networking, and application-level issues.
  • Implemented and enforced Kubernetes security best practices including Pod Security Standards, network policies, and secret management, passing internal security audits with zero critical findings


Monitoring & Observability:

  • Architected and managed Prometheus and Grafana-based monitoring for microservices, enabling real-time visibility, alerting, and dashboards that reduced incident detection time by 60%
  • Provided consultative pipeline recommendations to development teams, streamlining CI/CD processes and accelerating deployment frequency by 25%


Infrastructure as Code & Cloud Automation:

  • Deployed AWS resources using Terraform, including EC2, Lambda, S3, IAM, and VPC components, enabling infrastructure reliability.
  • Implemented Terraform structure for environment segregation (dev/prod), improving deployment consistency and reducing configuration drift.
  • Managed DNS and domain configurations via Cloudflare Terraform provider, DNS record management, and CDN configurations for high-availability applications
  • Designed and enforced IAM policies, roles, and users following the principle of least privilege, improving security posture and passing compliance audits
  • Automated serverless workflows with AWS Lambda and S3 event triggers
  • Configured AWS SNS for alerting and SES for email notifications integrated with monitoring systems, ensuring timely incident response and stakeholder communication


AI & Automation Initiatives:

  • Spearheaded AI-driven internal tools to enhance developer and SRE productivity, including intelligent alert triage, log analysis, and automated remediation suggestions
  • Led cross-functional AI projects focused on improving platform reliability and reducing operational toil through machine learning and LLM integration


QA Automation Engineer

Veritas Technologies
03.2020 - 12.2024


  • Engineered comprehensive CI/CD pipelines for containerized Veritas Infoscale on Kubernetes via Jenkins, increasing release frequency by 30% and reducing manual intervention by 50% through systematic automation
  • Implemented infrastructure as code using Terraform and Ansible for OpenShift/Kubernetes deployments, reducing provisioning time by 90% and ensuring consistent environments across development, staging, and production
  • Designed and deployed Azure Kubernetes Service (AKS) clusters for product qualification, implementing automated testing frameworks that reduced validation time by 70%
  • Developed Python-based automation framework for InfoScale installation, test execution, and reporting, improving testing efficiency by 40%
  • Led containerization strategy for enterprise applications on Kubernetes/OpenShift, improving scalability and resource utilization by 35%
  • Implemented GitOps workflows for configuration management, ensuring version-controlled infrastructure and reducing configuration drift by 80%
  • Orchestrated multi-cloud deployments across Azure and AWS, implementing disaster recovery strategies that improved system availability to 99.9%
  • Mentored junior team members on DevOps best practices, SRE principles, and automation techniques, improving team productivity by 25%
  • Collaborated with development teams to implement shift-left testing strategies, reducing post-deployment defects by 30%


Key Projects:


End-to-End CI/CD Pipeline for Kubernetes-Based Product Tools: Jenkins, Kubernetes, OpenShift, Terraform, Ansible, Python

  • Architected complete CI/CD pipeline automating build, test, and deployment processes
  • Achieved 90% reduction in manual deployment efforts and accelerated feedback cycles by 60%
  • Implemented infrastructure as code using Terraform for OpenShift cluster provisioning

Multi-Cloud Kubernetes Orchestration Platform Tools: Azure AKS, AWS EKS, Terraform, Helm, Python

  • Designed and deployed hybrid cloud Kubernetes infrastructure supporting multiple cloud providers
  • Improved deployment consistency across environments and reduced provisioning errors by 75%
  • Implemented automated scaling policies improving resource efficiency by 40%

OpenShift Infrastructure Automation Tools: Terraform, Ansible, VMware, KVM, Shell

  • Automated OpenShift installation on VMware/KVM hypervisors using infrastructure as code
  • Reduced cluster deployment time from days to hours and eliminated configuration variances
  • Created reusable Terraform modules for enterprise-wide adoption

Education

PG - High performance Computing, Linux, shell, Kubernetes, Python

University of Pune
Pune
01.2020

BE - Electronic Engineering

Pune university

Skills

  • Cloud & Infrastructure: AWS (EC2, S3, SNS, RDS, Lambda, IAM, VPC), Azure (AKS), Terraform
  • Container & Orchestration: Kubernetes (EKS, AKS, OpenShift), Docker, Helm
  • CI/CD & Automation: Jenkins, ArgoCD
  • Monitoring & Observability: Prometheus, Grafana, CloudWatch
  • Security & Compliance: IRSA, IAM, Kubernetes Security, Cloudflare, Least Privilege
  • Scripting & Development: Python, Shell/Bash
  • Databases & Messaging: AWS RDS infra maangement, SNS,
  • AI/ML Ops: OpenAI Integration, LLM Prompt Engineering, AI-Powered Alerting

Accomplishments

  • Awarded for contribution in AI Projects
  • Has been awarded by 4 times in contributing Kubernetes demo projects, setting up automation pipelines CI/CD and collaborative work.
  • Have received awards for resolving multiple OpenShift and Kubernetes cluster configuration issues.
  • Open-minded, outgoing, and patient in conversations with others.
  • Received awards for being a lead player among teams in resolving Linux/Network, Kubernetes Issues.

Recent Projects

PROJECT 1: AWS IRSA Migration & Kubernetes Security Hardening

Role: DevOps/SRE Engineer | Tools: AWS EKS, IAM, Terraform, Kubernetes, Python

  • Migrated 30+ microservices from IAM key-based authentication to AWS IAM Roles for Service Accounts (IRSA), eliminating long-lived credentials and enhancing security compliance by 98%
  • Designed and implemented least-privilege IAM policies using Terraform
  • Automated service account to IAM role mappings across multiple EKS clusters
  • Implemented Pod Security Standards (PSS) and network policies, restricting lateral movement and improving cluster security posture
  • Created self-service Terraform modules for development teams to securely request and manage IRSA roles

Impact: Eliminated credential exposure risks, passed security audits with zero critical findings, and reduced manual IAM management overhead

Role: Project Lead | Tools: FastAPI, OpenAI GPT, Slack Bolt, Prometheus, Loki, Redis, Kubernetes

PROJECT 2: AI-Powered Slack Bot for Kubernetes Alert Triage & Log Analysis

Role: Project Lead | Tools: FastAPI, OpenAI GPT, Slack Bolt, Prometheus, Loki, Redis, Kubernetes

  • Built an AI-driven Slack bot that automates Kubernetes alert triage and log summarization using OpenAI's LLM, reducing manual investigation time by 70%
  • Integrated with Loki for log aggregation and Prometheus for alert context, providing engineers with enriched, actionable insights directly in Slack
  • Implemented caching with Redis to optimize response times and reduce OpenAI API costs by 40%
  • Exposed custom Prometheus metrics for bot performance monitoring and usage analytics
  • Designed modular FastAPI backend supporting multiple AI models and alert sources

Impact: Reduced alert fatigue, accelerated MTTR by 60%, and improved on-call effectiveness through intelligent noise filtering

Timeline

Senior SRE/DevOps Engineer

DeepIntent
12.2024 - Current

QA Automation Engineer

Veritas Technologies
03.2020 - 12.2024

PG - High performance Computing, Linux, shell, Kubernetes, Python

University of Pune

BE - Electronic Engineering

Pune university
Akshay Nimbalkar