nvbandwidth¶

nvbandwidth is a tool developed by Nvidia to measure bandwidth and latency on Nvidia GPUs.

Installation¶

Follow the instructions on https://github.com/NVIDIA/nvbandwidth

Running nvbandwidth¶

device-to-device test cases will attempt to do peer-to-peer between NVIDIA GPUs. It is therefore required to enable p2p between any borrowed GPU and the GPUs that should be included in the test. For how to enable p2p with borrowed GPUs, see PCIe peer-to-peer.

We run the device_to_host, host_to_device and device_to_device workloads on a 2-node cluster with MXH530 and a direct x16 link, configured with 128GB BAR2 size. Each node has a single NVIDIA RTX PRO 4500 Blackwell GPU. The nodes are AMD EPYC 9015 8-core processors, with 192G memory, PCIe Generation 5 and IOMMU enabled. Each node is running Ubuntu 24.04.3 LTS with kernel 6.8.0-79-generic, NVIDIA driver version 570.172.08 open, CUDA version 12.8. In all test cases we use the default buffer size 512 MB.

We run the device_to_host and host_to_device test cases using a topology with 1 local GPU (Local) and 1 remote GPU (Remote), borrowed using Device lending. The results are summarized in the below plot.

We run the device_to_device test cases using 2 different topologies - one topology with 2 local GPUs (Local -> Local), and one topology with 1 local GPU and 1 remote GPU, borrowed using Device lending (Local -> Remote, Remote -> Local). The results are summarized in the below plot.

Known issues¶

On Ubuntu 24.04 (Linux kernel v6.8) we have run into issues with nvbandwidth where the p2p test cases trigger IOMMU faults, even when running the benchmark between two local GPUs:

nvidia 0000:c1:00.0: Using 47-bit DMA addresses
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffef000 flags=0x0020]
...
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffee068 flags=0x0020]
NVRM: iovaspaceDestruct_IMPL: 5 left-over mappings in IOVAS 0xc100
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
...
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601
NVRM: nvAssertFailedNoLog: Assertion failed: Sysmemdesc outlived its attached pGpu @ mem_desc.c:1514
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601

This is unrelated to device lending, as it happens even when P2P is done between two local GPUs with IOMMU enabled.

References¶

https://github.com/NVIDIA/nvbandwidth