Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic
ARUN SACHDEVA

ARUN SACHDEVA

Site Reliability Engineer
Faridabad,HR

Summary

Experienced Software Engineer with over 15 years of expertise in Site Reliability Engineering (SRE) and DevOps. Proficient in AWS Cloud Computing, Containerization & Orchestration using Kubernetes, Building IaaC (Infrastructure as a Code) solutions with Terraform, Monitoring systems using Prometheus, Grafana & Elasticsearch, and Automating solutions through Python & Bash Scripting. Enthusiastic, analytical, and responsive professional dedicated to efficiently providing solutions to problems. Collaborative team member who effectively collaborates with different teams to achieve goals. Self-motivated and a strong team player with excellent communication skills.

Overview

15
15
years of professional experience
8023
8023
years of post-secondary education
3
3
Languages

Work History

Lead Site Reliability Engineer

Freshworks
12.2021 - Current
  • Well Architected Framework Project: Under this project, wrote all the controls to identify non-standard practices happening at infrastructure layer across all products based on Reliability, Scalability, Cost, Availability & Security Pillars. Technologies used - Python, SQL, Steampipe & AWS.
  • Nagios to Prometheus Migration Project: Migrated applications from Nagios to Prometheus & Grafana Monitoring Solution as part of Nagios Depreciation Project. Technologies used - Prometheus, Grafana, Nagios, AWS Opsworks, AWS Cloudwatch
  • Spotinst.io to Karpenter Migration Project: Performed in-depth analysis for Karpenter solution and implemented it across all staging Kubernetes clusters which saved~$20000 per month for the organization as earlier it was using Spotinst.io (paid product from NetApp for running Spot instances). Technologies Used - Kubernetes, Karpenter, Linux, Terraform, ArgoCD.
  • Logs Standardization Project: Standardized logs coming from different technical components across all products based on factors like criticality, verbosity & retention period. It helped saving ~$10000/month for different AWS components like Kubernetes Control Plane, Application, ALB/ELB, S3, RDS, Redshift, Lambda, Cloudfront etc. Technologies Used - AWS Boto3, Python
  • Public IPs Cost Savings Project: Saved $5000/month by releasing Public IPs used across all products in the form of Unassigned Elastic IPs, Public ALB/ELB/NLBs for Internal Communication & Public EC2 instances.
  • Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
  • Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
  • Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.

Senior Site Reliability Engineer

Clearwater Analytics
10.2020 - Current
  • Monitoring the systems using tools like Graphite & Grafana, NewRelic and Kibana.
  • Deployment of applications in Production environment using in-house tool called Version Manager.
  • Managing infrastructure as a code using Puppet in on-prem environments.
  • Supporting applications & resources running in AWS Cloud.
  • Working as an Incident Commander responsible for managing Incidents, post-mortems, paging on-calls and handling incident management and communication processes.

Application Engineer

Expedia Group
06.2016 - 10.2020
  • Created an automated solution for Development teams which helps them in restoring old logs data from AWS S3 to Elasticsearch clusters in AWS EKS environment using Python, Jenkinsfile, Terraform and Git.
  • Created solution for restoring terrabytes of Email data from Eptica (third party vendor) to Expedia AWS Cloud environment which required setting up mail archive solution for storing data using AWS EC2, AWS SFTP, AWS EBS and Shell Scripting.
  • Used Terraform, Packer, Jenkins and Git to build infrastructure as a code solutions such as Kafka/Zookeeper cluster setup, AWS Glue/Firehose setup for Datalake, automated AWS resource creation for S3 buckets, IAM roles/policies etc.
  • Worked on setting up and managing resources in Kubernetes environment on AWS cloud (AWS EKS service).
  • Worked with development teams in migrating existing services from Onprem servers to AWS cloud.
  • Used Ansible and Jenkins to automate tasks that involved a large number of servers such as OS patching, installing softwares/packages such as datadog agent, qualys agent, td-agent etc;.
  • Automating tasks using bash scripting such as log rotation, SSL certificate import in keystores etc.
  • Worked on setting up Nginx as a proxy server and deploying configuration changes related to redirections, rewrite rules, upstreams etc.
  • To cater the needs of development teams.
  • Setup Elasticsearch, Fluentd and Kibana on AWS EC2 instances as a self-managed cluster using Terraform, Packer, Jenkins and Git; created Jenkins jobs to backup indexes using snapshots.
  • Deployed resources in RabbitMQ clusters such as creating queues, exchange, bindings etc; performed administration tasks such as shovelling, federation, queue purge, cluster upgrade etc.
  • As per the requirements.
  • Worked on setting up Akamai as a CDN solution for LAB environment to make it similar to Production environment in coordination with Akamai Support.
  • Performed Blue/Green/Canary deployment of applications through Jenkins in coordination with developers, testers and third party teams.
  • Provided production support for applications; troubleshooting and investigating production outages; documenting root cause analysis for learning debriefs.

Senior Consultant

Oracle India Pvt. Ltd
09.2015 - 06.2016
  • Provided middleware administration support for Oracle HTTP Server, Oracle Web Cache, Oracle Webcenter Sites, Remote Satellite Servers, Oracle Webcenter Portal on Linux environment.
  • Resolved infrastructure issues related to three 3 tier architecture comprising of load balancers, Firewall, DNS, Web & Application hosting, backup/storage and database servers.
  • Worked on the implementation and setup of middleware components used for hosting application.
  • It consists of planning the architecture, installation and configuration of FMW components, setting up monitoring scripts and resolving day-to-day infrastructure related issues.
  • Worked on patching of Oracle products to fix bugs and SSL implementation, applied CPUs and PSUs related to OHS, Weblogic, Webcenter Portal and Webcenter Sites for fixing bugs, implemented SSL at Load Balancer, Web servers and Application servers.
  • Deployment of application in coordination with developers, testers and third party teams through voice, chat and email.
  • Performance Tuning at Web and Application layer using JMC Tool, JVisualVM, Heap and Thread dumps analysis, JFR recordings analysis, verbosegc log analysis for resolving issues related to performance, worked on server sizing and capacity planning.

Module Lead

Mercer India Pvt. Ltd
07.2014 - 09.2015
  • Provided middleware administration support for Apache HTTP Server, Apache Tomcat, Jboss, LAMP, Oracle Glassfish, Oracle HTTP Server, IBM Websphere, Sunone Web/App Server in a 24.
  • 7 Production Support environment on UNIX/Linux environment.
  • Resolved infrastructure issues related to three 3 tier architecture comprising of F5 load balancers, Firewall, DNS, Web/App and Database servers.
  • Handled the ITP project (Infrastructure Transformation Program) wherein all applications are being migrated from physical to virtual servers.
  • It consisted of setting new infrastructure such as creating F5 load balancer VIPs, Firewall/DNS changes, migration of Web/App servers from physical to virtual servers, automating tasks for server installation and configuration, setting up monitoring scripts and resolving issues.
  • Application Monitoring and Troubleshooting issues in coordination with developers, testers and third party teams through voice, chat and email.

Lead Engineer

HCL Technologies Pvt. Ltd
05.2013 - 07.2014
  • Provided administration support for Oracle Web Tier, Apache Tomcat, Oracle Weblogic Server and Oracle Webcenter Portal in a 24.
  • 7 Production Support environment.
  • Handled Release & Change Management activities consisting of Build & Deployment, Environment Configuration Changes, Repository Management and Release Planning & Execution.
  • Addressed Incidents, provided resolution to the Problems & Root Cause Analysis within assigned SLA's.
  • Worked on Performance Tuning such as Data Source tuning, JVM heap size tuning, analyzing heap dumps & thread dumps and applying patches to fix bugs.
  • Worked on upgradation of Oracle Webcenter Portal from Patch Set 4 (11.1.1.5) to Patch Set 6 (11.1.1.7).
  • Automated admin tasks such as Log Archiving for Weblogic, MDS Backup for Webcenter Portal Applications, Monitoring Script and Auto-Deployment Script for J2EE applications.
  • Configured WLDF in Weblogic to notify the state of servers & deployments through mails.
  • Handled Downtime of Applications & Web Servers during Monthly/Quarterly security patching.
  • Application Monitoring and Troubleshooting issues in coordination with developers, testers and third party teams through voice, chat and email.

Consultant

Genpact Headstrong Capital Markets
11.2011 - 05.2013
  • Worked on Installation & Configuration of Apache HTTP 2.2, Oracle Weblogic 10.3.5 and Oracle Webcenter Portal 11.1.1.5 on a Unix/Linux Platform.
  • Worked on configuration and maintenance of Apache HTTP server such as hosting Virtual Web Sites, setting up Proxy Servers, updating Web Server Content, SSL, Certificate Renewals, and Log analysis.
  • Configured Domains in Oracle Weblogic consisting of Admin Server, Managed Servers, Node Managers, Clusters, Data Sources, JMS, Security Realms, Keystores and SSL.
  • Configured Custom Managed Servers for Webcenter Portal Applications.
  • Performed deployments of applications on Weblogic, Tomcat, JBoss and Webcenter through Admin Console, Enterprise Manager and WLST on production and non-production environments.
  • Actively involved in troubleshooting issues with developers, testers and third party teams such as Akamai, Load Balancer, Network and Database through voice, chat and email.
  • Performed heap dump, thread dump analysis using Eclipse Memory Analyzer, Thread Dump Analyzer to resolve issues.
  • Handled Downtime of Application & Web Servers efficiently during Monthly/Quarterly security patching and Data Center Migration.
  • Monitoring servers and applications through various tools such as Admin Console, Enterprise Manager, SiteScope, and Oracle JRMC Toolkit.
  • Handled incidents and problems using SM9 and HP OVSD.
  • Involved in fixing day-to-day environment issues within defined SLA's.

Assistant System Engineer

Tata Consultancy Services
10.2009 - 10.2011
  • Understanding Project Requirements.
  • Load Test Execution using Test Harness (Java based Client Tool).
  • Load Test Monitoring using Load Runner (TPS, Response Time, CPU Utilization).
  • Weblogic Administration (Installation, Deployment, Domain Configuration, Managing Servers using Admin Console, Issue Analysis/Troubleshooting).
  • Code Review and Optimization.
  • Unix Shell Scripting.
  • Capacity Planning and Projections based on Load Test Results (MS Excel based Tool).
  • Onsite/Offshore co-ordination and providing Client Deliverable.

Education

Master of Science - Information Technology

DA-IICT
Gandhinagar, Gujarat

Bachelor's of Engineering - Computer Science

CITM
Faridabad, Haryana

12th -

CBSE

10th -

Pratap Public School, CBSE
Karnal, Haryana

Skills

Python Scripting

Accomplishments

  • In Expedia Group, got nominated for "Horizon Award" which is used to recognize high performance throughoutthe year and long term potential.
  • In Expedia Group, got selected from Egencia IDC (India Development Center) to attend Nginx Conference inPortland, Oregon, USA due to good performance in this space.
  • In Mercer India Pvt.
  • Ltd., awarded "Star of the month" for single handedly managing the complex applicationsefficiently such as DC (Defined Contribution), Siebel/CRM, Thunderhead and Docman/Filenet under ITP Project.
  • In Genpact Headstrong Capital Markets, awarded "Cause of Applause" for coordinating downtime, performingsuccessful application & server validation and extending support during Data Center Migration of allProduction/Test/Development servers.
  • In DAIICT, came 1st Runner up at National Level Cricket Tournament held at IIT, Kanpur during their AnnualSports Festival "Udghosh" in September 2007.

Timeline

Lead Site Reliability Engineer

Freshworks
12.2021 - Current

Senior Site Reliability Engineer

Clearwater Analytics
10.2020 - Current

Application Engineer

Expedia Group
06.2016 - 10.2020

Senior Consultant

Oracle India Pvt. Ltd
09.2015 - 06.2016

Module Lead

Mercer India Pvt. Ltd
07.2014 - 09.2015

Lead Engineer

HCL Technologies Pvt. Ltd
05.2013 - 07.2014

Consultant

Genpact Headstrong Capital Markets
11.2011 - 05.2013

Assistant System Engineer

Tata Consultancy Services
10.2009 - 10.2011

Master of Science - Information Technology

DA-IICT

Bachelor's of Engineering - Computer Science

CITM

12th -

CBSE

10th -

Pratap Public School, CBSE
ARUN SACHDEVASite Reliability Engineer