nvbandwidth

nvbandwidth is a tool developed by Nvidia to measure bandwidth and latency on Nvidia GPUs.

Installation

Follow the instructions on https://github.com/NVIDIA/nvbandwidth

Benchmarking setup

Category

Details

Topology

2 nodes named A and B

CPU

AMD EPYC 9015 8-Core Processor

Motherboard

Supermicro H13SSL-N

Memory

12x 16 DDR5 DIMMs at 6.4 GHz

Operating system

Ubuntu 24.04

Kernel

Linux 6.8

Host adapter cards

MXH530 PCIe 5.0 NTB Host Adapter

Switch box

none

Cables

PCIe 5.0 / SFF-8614 Cables

Driver version

5.24

NVIDIA GPUs

4x NVIDIA RTX PRO 4500 Blackwell (2x in each node)

NVIDIA driver version

570

CUDA toolkit version

12.8

NCCL version

2.27.7

nccl-tests version

2.16.5

For this experiment we use only 2 GPUs in total.

Running nvbandwidth

device-to-device test cases will attempt to do peer-to-peer between NVIDIA GPUs. It is therefore required to enable p2p between any borrowed GPU and the GPUs that should be included in the test. For how to enable p2p with borrowed GPUs, see PCIe peer-to-peer.

We run the device_to_host and host_to_device test cases using a topology with 1 local GPU (Local) and 1 remote GPU (Remote), borrowed using Device lending. The results are summarized in the below plot.

We run the device_to_device test cases using 2 different topologies - one topology with 2 local GPUs (Local -> Local), and one topology with 1 local GPU and 1 remote GPU, borrowed using Device lending (Local -> Remote, Remote -> Local). The results are summarized in the below plot.

Known issues

On Ubuntu 24.04 (Linux kernel v6.8) we have run into issues with nvbandwidth where the p2p test cases trigger IOMMU faults, even when running the benchmark between two local GPUs:

nvidia 0000:c1:00.0: Using 47-bit DMA addresses
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffef000 flags=0x0020]
...
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffee068 flags=0x0020]
NVRM: iovaspaceDestruct_IMPL: 5 left-over mappings in IOVAS 0xc100
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
...
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601
NVRM: nvAssertFailedNoLog: Assertion failed: Sysmemdesc outlived its attached pGpu @ mem_desc.c:1514
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601

This is unrelated to device lending, as it happens even when P2P is done between two local GPUs with IOMMU enabled.


References