nvbandwidth¶
nvbandwidth is a tool developed by Nvidia to measure bandwidth and latency on
Nvidia GPUs.
Installation¶
Follow the instructions on https://github.com/NVIDIA/nvbandwidth
Benchmarking setup¶
Category |
Details |
|---|---|
Topology |
2 nodes named A and B |
CPU |
AMD EPYC 9015 8-Core Processor |
Motherboard |
Supermicro H13SSL-N |
Memory |
12x 16 DDR5 DIMMs at 6.4 GHz |
Operating system |
Ubuntu 24.04 |
Kernel |
Linux 6.8 |
Host adapter cards |
|
Switch box |
none |
Cables |
|
Driver version |
|
NVIDIA GPUs |
4x NVIDIA RTX PRO 4500 Blackwell (2x in each node) |
NVIDIA driver version |
570 |
CUDA toolkit version |
12.8 |
NCCL version |
2.27.7 |
nccl-tests version |
2.16.5 |
For this experiment we use only 2 GPUs in total.
Running nvbandwidth¶
device-to-device test cases will attempt to do peer-to-peer
between NVIDIA GPUs. It is therefore required to enable p2p
between any borrowed GPU and the GPUs that should be included in the test. For
how to enable p2p with borrowed GPUs, see PCIe peer-to-peer.
We run the device_to_host and host_to_device test cases using a topology with 1 local GPU (Local) and 1 remote GPU (Remote), borrowed using Device lending. The results are summarized in the below plot.
We run the device_to_device test cases using 2 different topologies - one topology with 2 local GPUs (Local -> Local), and one topology with 1 local GPU and 1 remote GPU, borrowed using Device lending (Local -> Remote, Remote -> Local). The results are summarized in the below plot.
Known issues¶
On Ubuntu 24.04 (Linux kernel v6.8) we have run into issues with nvbandwidth
where the p2p test cases trigger IOMMU faults, even when running the benchmark
between two local GPUs:
nvidia 0000:c1:00.0: Using 47-bit DMA addresses
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffef000 flags=0x0020]
...
nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffee068 flags=0x0020]
NVRM: iovaspaceDestruct_IMPL: 5 left-over mappings in IOVAS 0xc100
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
...
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601
NVRM: nvAssertFailedNoLog: Assertion failed: Sysmemdesc outlived its attached pGpu @ mem_desc.c:1514
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:592
NVRM: nvAssertFailedNoLog: Assertion failed: pIOVAS != NULL @ io_vaspace.c:601
This is unrelated to device lending, as it happens even when P2P is done between two local GPUs with IOMMU enabled.