Summary

Overview

Work History

Education

Skills

Certification

Timeline

Anindya Das

SRE Manager

Bangalore

Summary

IT experience of 18 years. Observability/SRE Manager with 7+ years of experience in designing and scaling observability strategies across enterprise systems. Proven track record of implementing end-to-end visibility solutions using Splunk Observability Cloud, Datadog, and Grafana. Strong leadership in driving SRE and DevOps teams toward improved system health, reduced MTTR, and operational excellence. Skilled in KPI-based monitoring, AIOps, self-healing automation, and cost optimization of monitoring platforms.

Overview

years of professional experience

Certifications

Work History

Observability/SRE Manager

Tata Consultancy Services (TCS)

01.2020 - Current

Defined and executed observability strategy across enterprise systems, aligning long-term vision with business goals.
Partnered with engineering, DevOps, and application teams to deliver seamless, self-service integration for infrastructure instrumentation, APM, and synthetic monitoring in Splunk Observability Cloud.
Designed and deployed a reusable, scalable metrics ingestion and query platform leveraging Python automation and Splunk APIs, reducing onboarding time by 40% and minimizing human errors.
Implemented dynamic threshold alerting integrated with self-healing automation for critical services, reducing MTTR by 30%.
Created real-time business KPI dashboards tracking transaction throughput, SLA adherence, and user engagement to guide stakeholder decisions.
Established SLI/SLO frameworks.
Automated incident creation and enrichment via ServiceNow integration, ensuring rapid root cause identification with telemetry context.
Led team building, recruitment, and knowledge base creation to scale observability adoption.
Oversaw the implementation and delivery of software projects, including resource allocation, ensuring timely and within-scope completion.
Led adoption of internal deployment platforms, CI/CD pipelines.
Integrated AI/ML-based anomaly detection into Splunk Observability Cloud, improving early detection of system degradations and reducing false positives by 25%.
Implemented predictive alerting models using historical telemetry data to anticipate capacity issues, helping business units prevent SLA breaches.
Designed self-healing workflows enhanced with AI-driven root cause analysis (RCA), enabling faster remediation of recurring incidents.
Mentored team members, driving productivity, setting performance expectations, aligning goals, giving/receiving feedback, fostering development, and nurturing an inclusive, high-performing culture.

SRE Manager

London Stock Exchange Group (LSEG)

01.2019 - 01.2020

Managed a globally distributed SRE team with full ownership of observability platforms and strategy.
Led the migration to a unified observability stack, reducing licensing costs by 20% and improving platform reliability.
Championed synthetic monitoring and service dependency mapping for mission-critical systems.
Automated operational intelligence workflows to reduce alert fatigue and improve incident resolution times.
Led the AIOps program, applying ML models to reduce repetitive alerts and tickets, resulting in a 30% year-on-year reduction in incident volume.
Deployed service dependency mapping enhanced with AI clustering to identify hidden service relationships and failure propagation patterns.
Guided the creation and maintenance of comprehensive documentation for applications, deployment workflows, and system configurations.
Built and maintained strong relationships with internal and external stakeholders, ensuring clear communication and project alignment.
Mentored team members, driving productivity, setting performance expectations, aligning goals, giving/receiving feedback, fostering development, and nurturing an inclusive, high-performing culture.

DevOps Manager

Wipro Technologies

01.2015 - 01.2019

Led DevOps enablement for large-scale applications, including observability integrations into CI/CD pipelines.
Designed scalable Kubernetes-based environments with telemetry instrumentation.
Improved deployment reliability by incorporating release validation metrics into monitoring dashboards.

Education

B.Tech - Electronics & Telecomm Engineering

I.T.E.R Engineering College, BBSR

01.2007

Board/University: BPUT, Rourkela
GPA: 8.14

10+2 - Science

M.P.C Junior College, Baripada

01.2002

Board/University: C.H.S.E Orissa
GPA: 79.4%

10th - S.S.C

M.K.C High School, Baripada

01.2000

Board/University: H.S.C Orissa
GPA: 89.4%

Skills

AI in Observability: AIOps, anomaly detection, predictive monitoring, AI-driven root cause analysis

Data & ML Tools: Python (scikit-learn, pandas), Splunk Machine Learning Toolkit (MLTK), AWS AI Services

Cloud & Infrastructure: AWS, Kubernetes, Docker

Automation & CI/CD: Jenkins, Rundeck, Ansible, Puppet

Scripting & Tools: Python, Bash, Shell, TCL

Incident & Problem Management: ServiceNow, Jira, Confluence

Leadership & Strategy: Observability Roadmaps, SRE Team Management, Agile methodology

Certification

Splunk Observability Cloud

Timeline

Observability/SRE Manager - Tata Consultancy Services (TCS)

01.2020 - Current

SRE Manager - London Stock Exchange Group (LSEG)

01.2019 - 01.2020

DevOps Manager - Wipro Technologies

01.2015 - 01.2019

I.T.E.R Engineering College, BBSR - B.Tech, Electronics & Telecomm Engineering

M.P.C Junior College, Baripada - 10+2, Science

M.K.C High School, Baripada - 10th, S.S.C