Summary
Overview
Work History
Education
Skills
Tools Expertise
Tech Expertise
Timeline
Generic

Jibin Rajan

Senior Software Engineer - SRE
Bangalore

Summary

Results-driven Senior Site Reliability Engineer (SRE) with 7+ years of experience in building and operating highly available, observable, and automated infrastructure across cloud and hybrid environments. Proven expertise in DevOps practices, Kubernetes, Linux, CI/CD pipelines, and monitoring tools like Datadog and Grafana.

Skilled in leading cross-functional initiatives to drive SLO-based reliability, develop automation platforms, and deliver secure, resilient systems. Architected an internal SLO-as-a-Service platform, led the development of centralized automation tooling (Autotron) using Go, Ansible, and Selenium, and currently leading a vulnerability remediation automation system integrating Qualys and LLMs.

Known for strong leadership, mentoring capabilities, and a hands-on approach to solving complex system challenges. Passionate about observability, automation, and continuous improvement, with a track record of scaling reliability practices across teams and platforms.

Overview

11
11
years of professional experience
4
4
years of post-secondary education

Work History

Senior Software Engineer - SRE

A.P. Moller Maersk
05.2020 - Current
  • Led observability initiatives across cloud and hybrid platforms; managed end-to-end lifecycle of tools like Datadog and Grafana stack, enabling teams to monitor, alert, and improve service performance based on SLOs.
  • Architected and developed a full-stack SLO-as-a-Service platform using Node.js, React, and Grafana Mimir; automated creation of SLOs, burn rate alerts, error budgets, and visual dashboards to support proactive reliability management.
  • Implemented and scaled a DevOps toolchain using GitHub Actions, FluxCD, Helm, and Terraform to support continuous integration, deployment, and infrastructure as code across cloud and air-gapped environments.
  • Built and maintained central automation framework (Autotron) in Go, leveraging Ansible and Selenium to standardize execution of workflows across multiple environments.
  • Designed and developed an automated vulnerability management system integrating Qualys, LLM-based remediation planning, and Ansible-based patching, streamlining security compliance workflows.
  • Championed a reliability engineering framework, focusing on:
    Swift incident recovery and automated healing
    Toil reduction and process automation
    Blameless post-mortems and root cause analysis
    SLO-driven operations with high observability
    Resilient, fault-tolerant system architecture
  • Mentored junior engineers and interns: delegated project tasks, provided technical coaching, conducted code reviews and paired programming to grow team expertise and delivery velocity.
  • Identified operational and architectural bottlenecks; proposed and implemented improvements to drive scalability, security, and uptime.
  • Strong hands-on experience in Kubernetes, Docker, Linux administration, GitOps, CI/CD, monitoring and alerting, and cloud/hybrid infrastructure automation.

DevOps Engineer

JCPenney
05.2017 - Current
  • Main responsibility at my current organization includes Deploy, Automate, Maintain and Manage the production system hosted in AWS cloud. All this is done in order to ensure high levels of availability, performance and Scalability using CI/CD Tools.
  • Wrote code using python modules ( Troposhere/boto ) to automate different parts of infrastructure.
  • Wrote Ansible playbooks to automate repetitive tasks such as hardening VM's , setting up environments which helped in true infrastructure as code.
  • Worked effectively with cross-functional design teams to create software solutions using AWS ( RDS/Lambda/S3/IAM/Kinesis ) that elevated client side experience and significantly improved overall functionality and performance.
  • Monitored automated build and continuous software integration process to drive build/release failure resolution using Jenkins.
  • Worked on POC for APM Tools such as Skywalking ( Open source ) automated the entire process using CI/CD Tools.
  • Worked on legacy server migration to cloud by effectively moving code bases to bitbucket Repository.

Monitoring Tools Admin

JCPenney
05.2017 - 10.2018
  • Worked as part of Monitoring Tools Admin team and have an extensive experience on monitoring tools deployment and initial setups ( Splunk/Dynatrace/Tealeaf/Quantum Metrics ) .
  • Life cycle Management and upgrades of monitoring tools currently used for monitoring .com stability.
  • Well versed in monitoring tools Infrastructure maintenance which includes license rotations , regular maintenance ,certificate rotations , troubleshooting etc.
  • Well versed in dashboard creation which helped cross functional teams to identify performance bottlenecks or issues which could caused effective site downtimes and helped in averting global issues.
  • Worked on different integrations of monitoring tools with different communication channels ( MS Teams/ Pager Duty ) which helped teams on effective alert management.
  • Was able to declutter alerts and reduce noise by effective threshold management and tweaking alert triggering mechanisms. which helped teams to prioritize alerts.
  • Helped in talking key decision related to tool usage/concurrent users/License procurement during POC phase for different tools by analyzing trends and inbuilt dashboard on monitoring tools.
  • Resolved problems, improved operations and provided exceptional client support.

Site Reliability Engineer

Tata Consultancy Services
03.2014 - 04.2017
  • Main responsibilities included monitor systems and react when things go wrong, constantly writing and rewriting response playbooks to reduce the time to fix any breakdown which may occur
  • Reduce MTTD/MTTR to ensure zero downtime on target.com.
  • Conduction of post incident reviews and document the same for future process improvements.
  • Continuously watch the glass for order drops or critical incidents which cause downtime and report to appropriate stake holders.

Education

Bachelor of Engineering - B.E Electrical And Electronics

V.S.B Engineering College
01.2009 - 01.2013

Skills

Cloud infrastructure ( AWS/Azure )

Site Reliability Engineering

Monitoring tools Admin

Infrastructure automation

DevOps practices

Continuous integration

Software development ( GoLang )

Tools Expertise

  • Ansible
  • Jenkins/Nexus/Sonar
  • GitHub
  • JIRA/Remedy/Service Now
  • Docker
  • Kubernetes
  • Splunk
  • Dynatrace
  • Tealeaf/Quantum Metrics
  • Grafana

Tech Expertise

  • AWS /Azure
  • Linux/Windows Adminstration
  • Shell Scripting
  • Golang Coding

Timeline

Senior Software Engineer - SRE

A.P. Moller Maersk
05.2020 - Current

DevOps Engineer

JCPenney
05.2017 - Current

Monitoring Tools Admin

JCPenney
05.2017 - 10.2018

Site Reliability Engineer

Tata Consultancy Services
03.2014 - 04.2017

Bachelor of Engineering - B.E Electrical And Electronics

V.S.B Engineering College
01.2009 - 01.2013
Jibin RajanSenior Software Engineer - SRE