Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sanjay Suthar

Bangalore

Summary

Senior System Software Engineer with 7+ years of experience in Linux kernel, BSP, SoC bring-up, CUDA, and low-level firmware development for high-performance embedded and AI computing platforms. Deep expertise in bootloaders, memory/cache management, PCIe-based systems, TrustZone/secure firmware, and distributed system firmware for AI accelerators. Hands-on with CUDA kernel programming, GPU memory hierarchy, and NPU-GPU workload orchestration. Proven record of working across the full system stack – from hardware programming and board bring-up to kernel drivers, RTOS, and AI operator-level scheduling.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Lead Embedded Software Engineer

Qualcomm India Pvt Ltd
02.2021 - Current
  • Designed and developed neural network firmware (control/data path) for Qualcomm Cloud AI 100 PCIe card, orchestrating workloads across Linux runtime, ARM cores, and Hexagon DSP.
  • Implemented MultiCard AI firmware to enable tensor-sliced execution across multiple PCIe devices using peer-to-peer (P2P) communication, scaling large language models (GPT-4, Codegen 70 Billion parameters).
  • Implemented collective communication library (like NCCL) for Neural Processor Unit firmware, enabling distributed execution of deep learning workloads across multiple compute nodes.
  • Designed and implemented a command processor to support PyTorch custom operators and efficiently schedule operators across multiple threads using barrier synchronization.
  • Engineered high-performance inter-processor communication (IPC) between ARM cores and Hexagon DSP with cache coherency, achieving multi-GB/s throughput.
  • Optimized neural network pipeline to reduce ResNet50 latency from 45 ms to 7 ms and improved LLM tokens/sec performance.
  • Integrated Secure Boot with ARM TrustZone and OP-TEE, enabling cryptographically signed and encrypted firmware/model and input loading for confidential computing.
  • Performed SoC and board bring-up: DDR calibration, power/clock initialization, device tree setup, and enabling high-speed interfaces (PCIe, SMBus).
  • Developed and integrated Linux kernel drivers for PCIe DMA bridge, SPI-NOR flash, MTD, EDAC and error handling subsystems for RAS feature
  • Conducted crash dump and memory corruption analysis, resolving DMA timeouts, system hangs, and improving RAS/BMC module robustness.
  • Discovered the HW Bugs in Internal IP RTL and PCIe ATU and collaborated with silicon teams to validate the fixes.
  • Tools/Technologies: HexagonDSP, PCIe controller, MMU/TLB, ELF loader, RTOS (Qurt), Linux kernel, TrustZone, OP-TEE, JTAG, Trace32, Yocto, CUDA.

Embedded Software Engineer

Picustech Software (NXP Semiconductors)
06.2018 - 02.2021

Project 1: 3D Audio Framework on i.MX8MM SoC

  • Developed a low-latency audio framework with Dolby/DTS/Fraunhofer codec support, HDMI EDID detection, and voice services integration on LK kernel
  • Worked on the Jailhouse static hypervisor to partition the NXP IMX8MM to run the RTOS (M4) along side Linux on A72 to provide real time audio processing.
  • Developed virtual sound driver layer for Linux to use the Audio services from RTOS.
  • Debugged and optimized synchronization issues, reducing audio glitches.

Project 2: ARM-based i.MX SoC Board Bring-Up & Driver Development

  • Ported Linux and QNX BSPs (uBoot, kernel, rootfs), configured device trees, enabled Ethernet, USB, SPI, and I2C support.
  • Developed drivers for MIPI-CSI cameras, LVDS displays, and audio codecs, validated video pipelines with V4L2 and Gstreamer based applications.
  • Reduced boot time by 30% with kernel tuning and peripheral parallelization.
  • Built custom Yocto layers, automated BSP builds, and authored DMA scripts for NXP SDMA.
  • Worked on linker scripts, MCUXpresso SDK driver porting for Cortex-M4.

Project 3: OP-TEE OS Integration on Custom i.MX Board

  • Integrated OP-TEE secure OS with Linux kernel and developed secure TEE applications.
  • Implemented ECDSA with SHA256, fixed crypto acceleration bugs, and resolved kernel hangs from misconfigured clocks.
  • Implemented HAB/CAAM secure boot with signed and encrypted firmware.

Education

Bachelor of Engineering - Electronics & Communication

Gujarat Technological University

Skills

    Core Expertise:

  • C/C, ARM (ARMv7/v8), Linux kernel/BSP, PCIe Architecture, Bootloaders, Linux Memory Management, Cache Coherency
  • Board Bring-up, Device Drivers (UART, SPI, I2C, PCIe, MIPI, LVDS), Secure Boot, TrustZone/OP-TEE
  • AI System Firmware: Distributed communication libraries, PyTorch custom operator integration, operator scheduling on NPU/ASIC, collective synchronization primitives
  • GPU/NPU Acceleration: Hands-on with CUDA programming, CUDA kernels, GPU memory hierarchy, warp execution model, GPU/NPU architectural trade-offs (tensor cores, scheduling, memory bandwidth optimization)
  • OS Fundamentals: Scheduling, interrupts, IPC, MMU, RTOS concepts
  • Debugging: Crash dump analysis, JTAG(TRACE32), GDB, Valgrind

    Tools & Technologies:

  • OS: Linux, QNX, FreeRTOS, OP-TEE, Qurt, Little Kernel, Android, QEMU
  • Build tools: Yocto, Buildroot, GNU Toolchain, Makefiles, CMake
  • Processors: iMX6/8, Cortex-A72/A53/M4, Hexagon DSP, NPU, NVIDIA RTX GPU , Raspberry Pi
  • Peripherals: PCIe, NVMe, QSPI, Ethernet, SMBus, MIPI-CSI/DSI
  • Debug Tools: Oscilloscope, Trace32, DMM
  • Accelerators: CUDA, NVIDIA Nsight Systems/Compute, heterogeneous compute optimization (GPU NPU)

Certification

  • Build Your Own RTOS from Ground Up (ARM)
  • FreeRTOS from Ground Up (ARM)
  • Linux System Programming (Advanced)
  • PCIe Architecutre and Protocol
  • ARM TrustZone Security & Virtualization Concepts
  • CUDA Optimization and GPU Computing Fundamentals (self-learning & projects)

Timeline

Lead Embedded Software Engineer

Qualcomm India Pvt Ltd
02.2021 - Current

Embedded Software Engineer

Picustech Software (NXP Semiconductors)
06.2018 - 02.2021

Bachelor of Engineering - Electronics & Communication

Gujarat Technological University
Sanjay Suthar