Summary
Overview
Work History
Key Technical Skills
Projects
Education
Community Engagement
Activities
Hi, I’m

Harikesh Shinde

AI Systems Engineer
Pune

Summary

Dynamic HPC-AI Application Engineer with a robust history of optimizing and porting scientific applications for peak performance on high-performance computing systems. Expertise in aligning HPC applications with specific workloads to ensure efficient resource utilization across diverse computing environments, enhancing the capabilities of HPC facilities throughout India. Proven ability to troubleshoot complex challenges and deliver innovative solutions that drive operational excellence. Deep knowledge of parallel programming, optimization techniques, and HPC architectures, complemented by proficiency in C/C++ and Python to create effective and scalable solutions.

Overview

8
years of professional experience

Work History

AMD

Sr Software Systems Design Engineer
06.2025 - Current

Job overview

  • Deployed a 4k+ GPU cluster under the Metal2Model Foundation on AMD Instinct MI325x accelerators.
  • Automated 40+ operational workflows using Salt Stack, Bash, Python, and Ansible, resulting in a 40% cut in manual intervention, and a 25% drop in system downtime.
  • Integrated features on the internal AI training product feature end-to-end deployment from bare metal to model training using Kubernetes.
  • Close involvement with CSP for enableling AI Infrastructure deployment and readiness (OCI, Azure, Tensorwave, Cursoe)
  • Engaged in continuous system monitoring and debugging, ensuring high availability and performance of critical applications.
  • Collaborated with cross-functional teams to integrate advanced features, resulting in improved user satisfaction and engagement.

8bit.ai

Senior HPC Engineer
02.2025 - 05.2025

Job overview

  • Design, deploy, and manage HPC clusters, ensuring high availability and performance.
  • Configure and optimize job schedulers (SLURM) and resource management tools.
  • Implement and maintain high-speed networking (InfiniBand, NVLink, RDMA).
  • Benchmark AI/ML applications on HPC infrastructure using frameworks like PyTorch, TensorFlow, and JAX.
  • Optimize GPU-accelerated workloads across various architectures (NVIDIA H200, H100, A100, V100, L40).
  • Work with end customers to assess HPC requirements, and design tailored solutions.
  • Evaluate hardware and software stacks for AI, deep learning, and scientific computing workloads.
  • Provide technical guidance on scalability, cost optimization, and workload balancing.
  • Configure and troubleshoot InfiniBand, NVLink, and high-speed interconnects.
  • Integrate high-performance storage solutions (Lustre, GPFS, NFS, Ceph) with AI workloads.
  • Implement monitoring and alerting systems (Prometheus, Grafana, Nagios) for proactive system health checks.
  • Presented technical findings to stakeholders, ensuring a clear understanding of project status and goals.

Center for Development of Advance Computing

Senior Project Engineer
01.2023 - 01.2025

Job overview

  • Optimized and ported scientific applications to run efficiently on high-performance computing systems, including clusters, grids, and clouds.
  • Architected and evaluated various hardware architectures for performance optimization, including Intel Microarchitectures, GPGPUs, Graphcore IPUs, Cerebras HPC Accelerators, MiPhi Memory Solutions, and IBM Watson Machine, etc.
  • Spearheaded the entire benchmarking front for system evaluation with various GPU architectures, sticking to HPC and AI domain-specific applications under the MLPerf consortium.
  • Collaborated with researchers to understand application requirements, and develop optimization strategies.
  • Worked with the HPC operations team to ensure the seamless integration of optimized applications with HPC infrastructure.
  • Participated in code reviews, and ensured adherence to coding standards and best practices.
  • Mentored junior engineers, and provided technical guidance and support.

Center for Development of Advance Computing

Project Engineer
09.2019 - 12.2022

Job overview

  • Experience to have work exposure for projects under the NSM (National Supercomputing Mission) with HPC-AI Technologies.
  • Major contributions to the research domain, PoC works, and benchmarking of applications in a cluster environment.
  • Assisted in the setup and configuration of a new HPC cluster, including software setup and system integration.
  • Used tools to automate application deployment and testing using tools like SPACK, ReFrame, etc.
  • Conducted performance benchmarking and tuning for improvement in computational efficiency for key research projects.
  • Provided technical support and training to over 1,000 end-users, enhancing their ability to effectively utilize HPC resources for their computational needs.

E-arth Solutions Pvt. Ltd

Former Intern
12.2017 - 01.2018

Job overview

  • Researched and identified new technologies and approaches, helping to proactively solve unique problems
  • Analyzed, designed, developed, and tested software, including embedded devices for organization's products and systems.
  • Gained valuable experience working within a specific industry, applying learned concepts directly into relevant work situations.

Key Technical Skills

  • Parallel Programming, Application - Porting, Optimization, Benchmarking, SLURM Workload Management, Cluster Management
  • CNN, RNN, Transformers, TensorFlow, PyTorch, Deepspeed, OpenCV, NLP, LLMs, MLOps, GenAI, MCP, ACP

Projects

Metal2Model Foundation

Project Summary: Metal2Model is a large-scale enablement and deployment program focused on integrating and managing AMD GPU infrastructure across cloud and on-prem environments, scaling from 1 to 10,000 GPUs. The project’s core goal is to deliver a robust, fault-tolerant, and production-grade GPU compute environment capable of supporting advanced AI and HPC workloads with high reliability and observability.

Responsibilities: Ensured maximum uptime and fault tolerance through resilient control plane design and proactive infrastructure management.

Roles: Deployed and managed the control plane stack — including Slurm for job scheduling, SaltStack for automation, and Prometheus + Grafana for real-time monitoring and alerting. Implemented cluster validation suites and executed Megatron-LM training jobs to assess GPU performance and readiness. Drove continuous optimization of GPU utilization, automation reliability, and observability across large-scale deployments.

Technical Skills: Salt Automation, Cluster Validation Suite, Ansible AWX Workflow

Recognition: Spotlight Award for enabling deployment of 4k+ GPU


Accelerated Cloud Platform 

Project Summary: The Accelerated Cloud Platform (CSP) is a high-performance, heterogeneous computing environment designed to support AI, HPC, and data-intensive workloads. It integrates multi-vendor GPU, CPU, and accelerator technologies under a unified orchestration and resource management layer to deliver optimized performance, scalability, and efficiency across diverse workloads.

Responsibilities: Led serverless vLLM deployment enabling dynamic scaling of inference workloads across GPU clusters. Conducted GPU workload validation, benchmarking, and performance assessment to ensure optimal utilization and reliability.

Roles: Led vendor evaluation and capability assessment, comparing performance, pricing, and scalability across NVIDIA, AMD and SambaNova, and other ecosystem partners. Oversaw cost budgeting and optimization, analyzing key infrastructure and business factors to ensure long-term efficiency.. Deployed and validated serverless vLLM workloads to test GPU utilization and performance consistency. Researched and implemented orchestration tools such as NVIDIA Dynamo, Run:AI, and Ray for dynamic, cloud-scale GPU management. Worked closely with infrastructure, finance, and AI platform teams to align deployment strategies with organizational goals.

Technical Skills: vLLM Deployment, Dynamo, Run:AI, Ray Cluster, Cost Optimization, GPU Benchmarking and Validation


HPC Projects under National Supercomputing Mission:

Project Summary: The National Supercomputing Mission (NSM) is a government initiative in India aimed at boosting the country's supercomputing capabilities. Launched in 2015, the mission involves setting up a network of high-performance computing (HPC) facilities across India. The goal is to empower researchers and scientists with advanced computing power to tackle complex problems in various fields, such as climate modeling, drug discovery, and materials science.

Responsibilities: Parallelization and Optimization of scientific codes on CPUs/GPUs, Parallelize and Optimize given scientific application on CPUs/GPUs (on single node or multi-node cluster), Work with the domain expert on defining the objectives and outcomes of the Parallelization and Optimizations efforts, Mentoring teams on Parallelization and Optimization efforts for ongoing projects, Training on Parallel Computing technologies.

Roles: Porting, Profiling, Optimization, Debugging and getting benchmarks of scientific applications and checking the application scalability. Providing support to researchers to interface with HPC infrastructures. Implementing and serving the management of job scheduling systems. Maintaining application like GROMACS, LAMMPS, NAMD, OpenFOAM, Quantum Espresso, WRF(Weather Research Forecasting), etc. Exposure of having worked on NSM Facilities pan India with 14+ NSM Sites, total computing power up to 22+ PF

Technical Skills: Parallel Programming, HPC, SLURM Workload Management, Optimization, Containerization, Benchmarking Application 


Artificial Intelligence Projects under National Supercomputing Mission:
Project Summary: The National Supercomputing Mission (NSM) in India aims to enhance the nation's capabilities in the field of HPC and make supercomputing resources accessible to research and industry. Under this mission, the Indian government has been pushing forward various artificial intelligence (AI) and machine learning (ML) projects to boost innovation and provide solutions in critical sectors.
Responsibilities: Developed a real-time image processing based application pipeline using TensorFlow and a GPU-accelerated CNN and Transformers for tasks like computer vision, natural language processing (NLP). Leading the benchmarking activities for evaluating AI application across GPU platforms ranging from different OEMs. Created in house benchmark application kernels for system acceptance test and evaluation. Working with cross-functional teams to deliver AI solutions.
Roles: Develop Machine Learning and Deep Learning models to solve complex problems, Implement advanced AI techniques (e.g., CNNs, RNNs and Transformers) for tasks like computer vision and Natural Language Processing (NLP). Gathering the data for image segmentation and classification task, data annotation, and preprocessing for creating the model. Deployment of ML applications on platforms like HuggingFace, Streamlit, Gradio, Heroku, etc. Actively support NSM sites and internal groups of AI/ML workloads. Benchmarking AI Application on open consortium platform like MLCommons.
Technical Skills: Python Programming, ML, DL, Relevant libraries awareness (Tensorflow, Pytorch, etc), OpenCV, NLP, LLMs, MLOps


PARAM Shavak Product Development:

Project Summary: PARAM Shavak - solution, aims to provide computational resource (Capacity building) with advanced technologies to perform high-end computations for scientific, engineering and academic programs to address and catalyze the research using modelling, simulation and data analysis. This initiative is expected to create HPC aware skilled workforce (Capability building) and for promoting research by integrating leading-edge emerging technologies at grass root level.

Responsibilities: System setup and configuration according to client requirement, Updating the product documentation, Ensuring the integrity with latest technology support. Technical support for client along with training on different variants of PARAM Shavak series like HPC, DL-GPU and VR.

Roles: System Solution Design, Development & Integration, Technical evaluation, System development, Product Documentation, Training activities for the marketing teams, Deployments at the client site.

Technical Skills: Shell & Python Scripting, Product Management


Education

Center For Development of Advance Computing
Pune, India

PG-Diploma from Big Data Analytics
08.2019

University Overview

Percentage: 71.75

Sanjivani College of Engineering
Ahmednagar, India

Bachelors of Engineering from Electronics And Telecommunication
07.2018

University Overview

Percentage: 65.42

Community Engagement

  • Active member in MLPerf Community which is an engineering consortium, built on a philosophy of open collaboration to improve AI systems and to assess transparent benchmarking process
  • Member of High Performance Software Foundation (HPSF) which is an open-source, vendor neutral hub for high performance software projects, working groups, events, and training
  • Working on Open Research Group for Exascale Computing Project (ECP)
  • Attended conferences and meetups to connect with other HPC professionals and share knowledge

Activities

  • Product Development of Param Shavak and integration of new tools and services
  • Active participation in organizing the Hackathon in collaboration with NVIDIA
  • Completed Certification in Deep Learning Specialization online from DIAT-Pune, and Future Skill Prime.
Harikesh ShindeAI Systems Engineer