Summary

Overview

Work History

Education

Skills

Tools Expertise

Tech Expertise

Timeline

Jibin Rajan

Senior Software Engineer - SRE

Bangalore

Summary

Results-driven Senior Site Reliability Engineer (SRE) with 7+ years of experience in building and operating highly available, observable, and automated infrastructure across cloud and hybrid environments. Proven expertise in DevOps practices, Kubernetes, Linux, CI/CD pipelines, and monitoring tools like Datadog and Grafana.

Skilled in leading cross-functional initiatives to drive SLO-based reliability, develop automation platforms, and deliver secure, resilient systems. Architected an internal SLO-as-a-Service platform, led the development of centralized automation tooling (Autotron) using Go, Ansible, and Selenium, and currently leading a vulnerability remediation automation system integrating Qualys and LLMs.

Known for strong leadership, mentoring capabilities, and a hands-on approach to solving complex system challenges. Passionate about observability, automation, and continuous improvement, with a track record of scaling reliability practices across teams and platforms.

Overview

years of professional experience

years of post-secondary education

Work History

Senior Software Engineer - SRE

A.P. Moller Maersk

05.2020 - Current

Led observability initiatives across cloud and hybrid platforms; managed end-to-end lifecycle of tools like Datadog and Grafana stack, enabling teams to monitor, alert, and improve service performance based on SLOs.
Architected and developed a full-stack SLO-as-a-Service platform using Node.js, React, and Grafana Mimir; automated creation of SLOs, burn rate alerts, error budgets, and visual dashboards to support proactive reliability management.
Implemented and scaled a DevOps toolchain using GitHub Actions, FluxCD, Helm, and Terraform to support continuous integration, deployment, and infrastructure as code across cloud and air-gapped environments.
Built and maintained central automation framework (Autotron) in Go, leveraging Ansible and Selenium to standardize execution of workflows across multiple environments.
Designed and developed an automated vulnerability management system integrating Qualys, LLM-based remediation planning, and Ansible-based patching, streamlining security compliance workflows.
Championed a reliability engineering framework, focusing on:
Swift incident recovery and automated healing
Toil reduction and process automation
Blameless post-mortems and root cause analysis
SLO-driven operations with high observability
Resilient, fault-tolerant system architecture
Mentored junior engineers and interns: delegated project tasks, provided technical coaching, conducted code reviews and paired programming to grow team expertise and delivery velocity.
Identified operational and architectural bottlenecks; proposed and implemented improvements to drive scalability, security, and uptime.
Strong hands-on experience in Kubernetes, Docker, Linux administration, GitOps, CI/CD, monitoring and alerting, and cloud/hybrid infrastructure automation.

DevOps Engineer

JCPenney

05.2017 - Current

Main responsibility at my current organization includes Deploy, Automate, Maintain and Manage the production system hosted in AWS cloud. All this is done in order to ensure high levels of availability, performance and Scalability using CI/CD Tools.
Wrote code using python modules ( Troposhere/boto ) to automate different parts of infrastructure.
Wrote Ansible playbooks to automate repetitive tasks such as hardening VM's , setting up environments which helped in true infrastructure as code.
Worked effectively with cross-functional design teams to create software solutions using AWS ( RDS/Lambda/S3/IAM/Kinesis ) that elevated client side experience and significantly improved overall functionality and performance.
Monitored automated build and continuous software integration process to drive build/release failure resolution using Jenkins.
Worked on POC for APM Tools such as Skywalking ( Open source ) automated the entire process using CI/CD Tools.
Worked on legacy server migration to cloud by effectively moving code bases to bitbucket Repository.

Monitoring Tools Admin

JCPenney

05.2017 - 10.2018

Worked as part of Monitoring Tools Admin team and have an extensive experience on monitoring tools deployment and initial setups ( Splunk/Dynatrace/Tealeaf/Quantum Metrics ) .
Life cycle Management and upgrades of monitoring tools currently used for monitoring .com stability.
Well versed in monitoring tools Infrastructure maintenance which includes license rotations , regular maintenance ,certificate rotations , troubleshooting etc.
Well versed in dashboard creation which helped cross functional teams to identify performance bottlenecks or issues which could caused effective site downtimes and helped in averting global issues.
Worked on different integrations of monitoring tools with different communication channels ( MS Teams/ Pager Duty ) which helped teams on effective alert management.
Was able to declutter alerts and reduce noise by effective threshold management and tweaking alert triggering mechanisms. which helped teams to prioritize alerts.
Helped in talking key decision related to tool usage/concurrent users/License procurement during POC phase for different tools by analyzing trends and inbuilt dashboard on monitoring tools.
Resolved problems, improved operations and provided exceptional client support.

Site Reliability Engineer

Tata Consultancy Services

03.2014 - 04.2017

Main responsibilities included monitor systems and react when things go wrong, constantly writing and rewriting response playbooks to reduce the time to fix any breakdown which may occur
Reduce MTTD/MTTR to ensure zero downtime on target.com.
Conduction of post incident reviews and document the same for future process improvements.
Continuously watch the glass for order drops or critical incidents which cause downtime and report to appropriate stake holders.

Education

Bachelor of Engineering - B.E Electrical And Electronics

V.S.B Engineering College

01.2009 - 01.2013

Skills

Cloud infrastructure ( AWS/Azure )

Site Reliability Engineering

Monitoring tools Admin

Infrastructure automation

DevOps practices

Continuous integration

Software development ( GoLang )

Tools Expertise

Ansible
Jenkins/Nexus/Sonar
GitHub
JIRA/Remedy/Service Now
Docker
Kubernetes
Splunk
Dynatrace
Tealeaf/Quantum Metrics
Grafana

Tech Expertise

AWS /Azure
Linux/Windows Adminstration
Shell Scripting
Golang Coding

Timeline

Senior Software Engineer - SRE

A.P. Moller Maersk

05.2020 - Current

DevOps Engineer

JCPenney

05.2017 - Current

Monitoring Tools Admin

JCPenney

05.2017 - 10.2018

Site Reliability Engineer

Tata Consultancy Services

03.2014 - 04.2017

Bachelor of Engineering - B.E Electrical And Electronics

V.S.B Engineering College

01.2009 - 01.2013

Jibin Rajan

Summary

Overview

Work History

Senior Software Engineer - SRE

DevOps Engineer

Monitoring Tools Admin

Site Reliability Engineer

Education

Bachelor of Engineering - B.E Electrical And Electronics

Skills

Tools Expertise

Tech Expertise

Timeline

Senior Software Engineer - SRE

DevOps Engineer

Monitoring Tools Admin

Site Reliability Engineer

Bachelor of Engineering - B.E Electrical And Electronics

Similar Profiles

Adewale BelloAdewale Bello

Wouter A. FouchéWouter A. Fouché

D. Mohan KumarD. Mohan Kumar

Neha PuranikNeha Puranik

Avinash BodheAvinash Bodhe