Software Engineer with 3+ years of experience in observability, automation, and production and distributed systems. Specialized in building AI-driven monitoring, RCA systems, and intelligent alerting using Python, Splunk, and Generative AI. Proven ability to improve incident response, system reliability, and reduce manual effort through automation.
Overview
3
3
years of professional experience
1
1
Certification
Work History
Software Engineer
Accenture
Kolkata, India
10.2022 - Current
Built Splunk-based 5xx alerting and automated RCA workflows using Power Automate and n8n
Developed Python automation (GCP SDK) for hourly Kubernetes cluster/pod health monitoring with proactive alerting
Created dashboards for LLM, KB, Agentic workflows, and API monitoring improving observability and reducing incident triage time
Designed automated RCA workflows for application and infrastructure alerts, reducing manual investigation effort
Monitored production systems using Splunk, AppDynamics, Grafana, and AWS
Performed RCA for API failures and latency issues using logs and metrics
Automated weekly reporting using Python, improving reporting efficiency
Executed 500+ change requests and coordinated deployments ensuring zero SLA breaches
Worked on API testing, debugging, monitoring, and Kubernetes health checks using Dynatrace, Splunk, and ServiceNow
Education
B.Tech - Mechanical Engineering
Haldia Institute of Technology
Haldia
01-2022
Skills
Java
Python
Generative AI
Agentic AI
Databricks
AWS
GCP
Splunk
AppDynamics
Grafana
Dynatrace
CI/CD
Jenkins
Microservices
Git
Postman
Jira
Observability
Incident Management
RCA
SRE Practices
Certification
Generative AI Engineer Associate - Databricks
Agentic AI - Accenture (Stanford HAI)
Timeline
Software Engineer
Accenture
10.2022 - Current
B.Tech - Mechanical Engineering
Haldia Institute of Technology
Key Highlights
Built AI-driven RCA and alerting systems integrating Splunk, Power Automate, and n8n
Automated Kubernetes cluster monitoring using Python (GCP SDK) with real-time Teams alerting
Executed 200+ production deployments and maintained zero SLA breaches
Designed observability solutions for LLM, API, and cloud-native systems improving incident triage time
Experienced in handling high-priority incidents in production environments