Summary
Overview
Work History
Education
Skills
Websites
Certification
Disclaimer
Projects
Credly
Languages
Timeline
Generic
Haresh Kumar K J

Haresh Kumar K J

Madurai

Summary

Seasoned HPC Software Engineer with a solid background in designing, coding, testing, debugging and providing resilient and reliable HPC system management software applications. Proven strengths include strong problem-solving abilities, effective teamwork skills, and adherence to software development best practices. Prior roles have demonstrated the ability to increase efficiency and ensuring resiliency of the software by developing innovative solutions and implementing continuous improvement processes.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Systems Software Engineer I

Hewlett Packard Enterprise
Bengaluru
09.2021 - Current
  • Domain: HPC & AI SW
  • Designed a procedure for Hypervisor Node Replacement, which could be necessitated due to hardware failure and/or due to system maintenance. This procedure is architected to be applied to a three-node cluster, which acts as the management plane for the storage nodes, and this cluster is resilient to a single node failure. This procedure will help to completely remove a hypervisor node from a cluster, and it includes a procedure to add the same node after maintenance, or a new node altogether, to the cluster.
  • Designed a completely new platform upgrade procedure, which will perform a rolling upgrade of a three-node cluster (management plane) from one major platform version to the next major platform version. The procedure is mainly intended to perform an upgrade with zero downtime for the services running on top of the platform.
  • Defined and executed resiliency testing for the three-node hypervisor cluster (management plane) during various major and minor releases of the platform.
  • Executed MPAT (Platform Acceptance Testing) for a three-node hypervisor cluster (management plane on various major and minor releases of the platform).
  • Designed a Rack Resiliency procedure for the management nodes (master, worker, and storage) of Cray High-Performance computers to ensure system reliability, which includes a detailed procedure to gracefully pull down a rack (which includes a collection of management nodes and a set of micro-services running on the Kubernetes platform) for maintenance and adding the same rack back to the management cluster.
  • Designed and developed a high-level architecture to implement resiliency at the rack level, enabling critical services and rack nodes (master, worker, and storage controlled by kubernetes) to function seamlessly despite a rack component or complete rack failure for HPE Cray EX systems.
  • Performed various HPE Cray EX systems components (microservices, Istio, and Kiali) upgrade and testing to make those microservices compatible with Kubernetes and other components.
  • Worked on the upgrade automation of management nodes (master, worker, and storage) of Cray high-performance computers using an internal framework designed for installation and upgrade.
  • Worked on customer-facing solution testing for the critical features developed, including multi-tenancy for HPC management systems. Developed test cases for various stages of multi-tenancy, including tenant creation, Slurm workload manager creation, and Slingshot network creation for the tenants.

R&D Intern

Hewlett Packard Enterprise
Bengaluru
01.2021 - 08.2021
  • Domain: HP-UX.
  • Automated the workflow of product releases on HP-UX, which will help to reduce the engineering effort and release cycle time. Product release activity of any HP-UX products includes five major stages: Build, Stage, Package, Test, and Deploy.
  • Manually executing all these product delivery activities involves a lot of engineer intervention and is error-prone. So, all five of these stages have been automated and enabled a Jenkins pipeline for all the stages.
  • This automation activity drastically reduced the product release activity time from 16 weeks to 6–7 weeks, and the same automation workflow has been tested for multiple versions of the HP-UX products.

Education

B.E. - ECE

PSG College Of Technology
Coimbatore
04-2021

XII -

Rajan Matriculation Higher Secondary School
05-2017

X -

Rajan Matriculation Higher Secondary School
05-2015

Skills

  • Windows
  • Linux
  • C/C
  • Python
  • Shell Scripting
  • MySQL
  • Golang
  • Docker
  • Kubernetes
  • Terraform
  • Git
  • Jenkins
  • Helm
  • Istio
  • Prometheus
  • Argo
  • AWS
  • Server
  • MS Office Suite
  • HPC Workload Managers
  • Slurm
  • Resiliency Testing
  • Robot Framework
  • ML
  • NLP
  • Software testing and implementation
  • JIRA

Certification

  • Certified Kubernetes Application Developer (CKAD) provided by Linux Foundation
  • Executive Post Graduate Certificate in Machine Learning and Deep Learning Program provided by IIITB
  • Terraform Associate (003) provided by Hashicorp
  • AWS Certified Cloud Practitioner provided by AWS

Disclaimer

I, Haresh Kumar K J, do hereby confirm that the information given above is true to the best of my knowledge., Madurai, 30/11/24

Projects

VOICE CONTROLLED WEBSITE, A news application which is completely controlled through voice is designed with the help of Alan AI voice assistant and React JS framework. 

SKILL BUILDING USING ALEXA SKILL KIT, A skill named THIRUKURRAL is built using ASK.

Credly

https://credly.com/users/haresh-kumar-k-j

Languages

English
First Language

Timeline

Systems Software Engineer I

Hewlett Packard Enterprise
09.2021 - Current

R&D Intern

Hewlett Packard Enterprise
01.2021 - 08.2021

B.E. - ECE

PSG College Of Technology

XII -

Rajan Matriculation Higher Secondary School

X -

Rajan Matriculation Higher Secondary School
Haresh Kumar K J