
Description:
In my role within HPC environments, I specialize in configuring, automating, and managing complex IT infrastructures. Utilizing tools such as Foreman, Ansible, GitLab, Semaphore, and the ELK stack, I optimize and maintain Slurm and Weka file systems. My focus includes supporting end-users and ensuring seamless operation of critical computational resources.
Description:
Highly skilled HPC (High-Performance Computing) Specialist with extensive experience in configuring, managing, and optimizing HPC environments both on-premises and in the cloud. Proven expertise in deploying and maintaining complex HPC systems, including AI-enhanced clusters, with a strong focus on automation and user support. Proficient in compiling and optimizing scientific applications, managing containerized workloads, and leveraging advanced GPU technologies.
Description:
The National Centre for Medium Range Weather Forecasting (NCMRWF) is a Centre of Excellence in Weather and Climate Modelling under the Ministry of Earth Sciences. Here, the HPC facility name is MIHIR with Rpeak of around 3 Petaflop. It is a Liquid Cooled system which provides a balanced and high-performance computing platform along with service nodes, login nodes, compute nodes and I/O nodes with Cray Aries HSN interconnects. It consists of 13 cabinets, in which each cabinet has 3 chassis, each chassis has 16 slots, and each slot has four nodes, in total it has 2322 nodes.
Description:
HPC System Administrator of IIT Delhi HPC Facility(PADUM). Handling such a big cluster of Rpeak around 2 Petaflop having more than 17000 CPU cores along with 234 Nvidia GPU K40 and 40 Nvidia GPU V100 GPU cards.
HPC
Slurm
PBS
Linux
NVIDIA GPUs
Bright Cluster Computing
OpenHPC
xCAT
CRAY Storage CLS300 and E1000
Infiniband
AWS
Kubernetes
Docker
Ansible
GIT
Semaphore
ELK
Zabbix
Nagios
Ganglia
BASH Scripting
Cray XC40, XC50 Cluster
Lustre File System
Weka File System