System Requirements¶

This page describes the system requirements of SmartIO. The exact requirements depend on the SmartIO functionality used on a specific node. We recommend that you refer to the “System Requirements” sections in the Quick Start Guides or Adding Devices to the Pool or Borrowing Pooled Devices.

Platform Requirements¶

SmartIO is supported on most x86_64 systems as well as selected ARM64 platforms [1].

Warning

Lending out local devices is not yet supported on Intel Xeon Ice Lake and newer. Please contact support for advice.

Platform	Lending	Borrowing	SISCI
AMD Ryzen	Supported	Supported	Supported
AMD EPYC	Limited, see No irq handler for vector in console	Supported	Supported
AMD Threadripper	Supported	Supported	Supported
Intel Core series	Peer-to-peer support limited, see PCIe peer-to-peer (P2P) support	Supported	Supported
Intel Xeon Cooper Lake and older	Supported	Supported	Supported
Intel Xeon Ice Lake	To be supported in future release	Supported	Supported
Intel Xeon Rocket Lake, Sapphire Rapids, Emerald Rapids and Granite Rapids	To be supported in future release	Supported	Supported
NVIDIA Xavier	Peer-to-peer support limited, see PCIe peer-to-peer (P2P) support	Supported (in 5.24)	Supported

Hint

Server platforms like Intel Xeon and AMD EPYC have more available PCIe lanes than their desktop counterparts. This is important to achieve the full performance of PCIe devices.

IOMMU / VT-d¶

It’s generally recommended to enable IOMMU / VT-d on all nodes when using SmartIO. Enabling IOMMU eases the requirements for Host NTB Adapter Prefetchable Size by allowing a smaller DMA window. The IOMMU isolates the devices from the rest of the system and the other nodes in the fabric. This can protect against bugs in the target device driver. It can also be very useful when a device is used with the SISCI API. On the lending side, the target device will only be granted access to designated DMA regions on the borrow side. On the borrow side, these DMA regions are mapped as requested by the target device driver while the rest of the region is protected. If the target device misbehaves and tries to DMA to a protected region, the system kernel will typically log this.

When the IOMMU is enabled all traffic is forced to go through the CPU. This can re-route peer-to-peer traffic between the NTB adapter and a lent device. When the NTB adapter and the lent device is directly connected to CPU-provided PCIe lanes, this causes only a minor performance penalty. Systems with on-board PCIe switches or using a PCIe expansion chassis can get a substantial performance penalty. In these cases disabling the IOMMU on the lender may be advisable.

ACS and IOMMU groups¶

ACS (Access Control Services) is a PCIe feature that is enabled to enforce isolation when IOMMU is enabled. The Linux kernel will group devices that cannot be isolated from each other into “IOMMU groups”. To lend a device in an IOMMU group, all of the other devices in the same group must also be added and made available. This prevents those other devices from being used locally while one or more of the devices in the group is borrowed. This is normally not a problem on most server and workstation systems. Systems with on-board PCIe switches or expansion chassis, or devices attached to the chipset can have IOMMU group issues.

Hint

On desktop platforms like Intel Core and AMD Ryzen it’s common for some of the PCIe slots to have lanes attached to the chipset. These slots are more likely to have ACS related issues.

NUMA and Multi-Socket Systems¶

Performance can be significantly impacted when peer-to-peer traffic between the NTB adapter and the lent device needs to cross between CPU sockets. The same effect can be seen on some single CPU systems like AMD EPYC [2]. To achieve the optimal performance it’s recommended that the NTB adapter and the devices to be lent are on the same NUMA node or I/O Die quadrant. Please refer to your system or motherboard manual for more details.

Hint

On NUMA systems, the NUMA node of a PCIe device can be discovered with lspci.

On AMD EPYC systems the NPS=4 BIOS setting exposes each CPU as 4 NUMA nodes. This can make is easier to discover which PCIe slots are associated with the same I/O Die / NUMA node. This setting can affect performance including memory bandwidth so you may want to set NPS back to the default value after determining the optimal PCIe slots. Refer to your systems manual for more details.

PCIe peer-to-peer (P2P) support¶

The lending side must support PCIe peer-to-peer transactions between the slot where the Dolphin PCIe adapter and the slot where the target device is installed. Peer-to-peer support is not needed to borrow remote devices. In our experience, most AMD and Intel Xeon systems support peer-to-peer. We recommend that you ask your system vendor if your system supports peer-to-peer. If your system has PCIe switches or you have a topology with transparent switches, peer-to-peer traffic can take the shortest path between two devices, avoiding going via the CPU/root complex. The shortest path routing will only happen if the IOMMU is disabled.

On platforms where peer-to-peer is not supported, but there is a direct path between the NTB and the target device via one or more PCIe switches, lending out that device can still work. This requires that the IOMMU is disabled.

Hint

In some cases, it’s possible to have partial support for p2p in a system. This can for instance be the case in systems with an internal PCIe switch or when using an expansion box. In these cases, note that enabling the IOMMU will force p2p transactions to go through the CPU. Enabling IOMMU requires the CPU to support p2p. This may cause p2p to work only when the IOMMU is off.

NTB cluster Requirements¶

Supported Host Adapters¶

Dolphin MXH Host Adapter	Status	Notes
MXH530	Supported in eXpressWare 5.24 and newer	Requires kernel 6.6 or newer [3]
MXH950	Supported
MXH940	Supported
MXH930	Supported
MXH830	Supported

Dolphin PXH Host Adapter	Status	Notes
PXH840	Supported
PXH830	Supported
PXH810	Partially supported	Fabric Attached Devices are not supported.
PXH820 / PXH824	Partially supported	Fabric Attached Devices are not supported.

Host NTB Adapter Prefetchable Size¶

It’s recommended to configure a large NTB prefetchable memory on the NTB host adapter when using SmartIO. The exact size required depends on the device and system configuration. While the default size of 256MiB may be enough for some cases, it’s recommended that the user increases the prefetchable memory if possible. It’s recommended to configure all nodes to have at least 32GiB of prefetchable memory.

Warning

Make sure 4G decoding is enabled in the BIOS / Firmware on your system before trying to increasing the prefetchable size. Increasing the prefetchable size with 4G decoding disabled can cause the system to fail to boot.

When borrowing a device, the borrower’s NTB adapter must map all the BARs of the device. Taking into account alignment requirements and additional space required for general communication, the NTB prefetch size must be larger than the combined size of all the BARs of all the devices that will be borrowed simultaneously. For instance, borrowing a device with one 8GiB BAR2 requires at least 8GiB mapping space, but due to the additional mapping space used for communication as well as alignment requirements, a prefetchable memory size of 8GiB is insufficient. Since the NTB prefetchable size can only be set in powers of two, the next step up is 16GiB which is enough for one such device, but not enough for 2 devices with 8GiB BARs.

NTB prefetchable memory space is also used on the lender side when a local device is borrowed by another node. The exact size depends on the DMA window size used by the borrower.

Setting the Adapter Prefetchable Size¶

The prefetchable size is set with physical DIP switches on the MXH adapter. Please refer to the User Guide of your NTB Adapter for specific instructions. For MXS switches with D-Switch topology the prefetch size is set in the switch web interface. Please refer to the MXS User Guide for instructions.

Linux

The prefetchable size can be set using the dis_config tool. Refer to Configuring the adapter card for instructions.

Windows

The prefetchable size can be set using the dis_config tool. Refer to Configuring the adapter card for instructions.

Operating System Requirements¶

Operating system	Lending	Borrowing	SISCI	Notes
RHEL 10 (AlmaLinux 10, Rockylinux 10, CentOS 10 Stream)	Supported	Supported	Supported
RHEL 9 (AlmaLinux 9, Rockylinux 9, CentOS 9 Stream)	Supported	Supported	Supported	MXH500-series supported since kernel-5.14.0-406.el9 [3]
RHEL 8 (AlmaLinux 8, Rockylinux 8, CentOS 8 Stream)	Supported	Supported	Supported	MXH500-series not supported [3]
Ubuntu 24.04 LTS	Supported	Supported	Supported
Ubuntu 22.04 LTS with Hardware Enablement (HWE) stack	Supported	Supported	Supported
Ubuntu 22.04 LTS	Supported	Supported	Supported	MXH500-series not supported [3]
Windows	Preview [4]	Not supported	Supported

Linux Kernel¶

eXpressWare and SmartIO is designed to work with a wide range of kernel versions, but eXpressWare is only tested and qualified with the kernels shipped by the distributions we support. Please refer to the table above. Contact support for additional information or support.

Supported Devices¶

SmartIO is designed to work with all PCIe compliant devices and device drivers by implementing support for all the required PCIe features. Some legacy PCI features are not supported, but this does not impact most (if any) PCIe devices. Verified to be working device types include:

NVMe drives from multiple vendors
NVIDIA GPUs from multiple generations
Intel Ethernet adapters
Mellanox/NVIDIA ConnectX network adapters
Various FPGAs

In general, almost all PCIe devices are supported with SmartIO and device lending. More specifically SmartIO supports the following features:

Feature	Support Status
Memory Space Device Registers (BARs), prefetchable and non-prefetchable.	Supported
DMA to/from Device to RAM (“Zero-Copy”)	Supported
MSI Interrupts	Supported
MSI-X Interrupts	Supported
Peer-to-peer	Supported. See PCIe peer-to-peer
SR-IOV	Supported. See Lending Virtual Function of an SR-IOV device
Configuration Space	Supported
Legacy Pin-based interrupts (INTx).	Not supported [5]
IO Space Device Registers (BARs)	Not supported [6]

NVIDIA GPU¶

NVIDIA GPUs are fully functional including support for CUDA, Unified Memory, peer-to-peer and graphical applications. The following list specifies the verified NVIDIA GPU architectures:

NVIDIA GPU Architecture	Tested on	Notes
Blackwell	GeForce RTX 5070, RTX 4050 PRO
Ada Lovelace	NVIDIA RTX 2000 Ada, GeForce RTX 4070
Hopper	Untested
Ampere	RTX A4500	See Addressing Limitations on Older NVIDIA GPUs
Volta	Tesla V100	See Addressing Limitations on Older NVIDIA GPUs
Touring	NVIDIA T600	See Addressing Limitations on Older NVIDIA GPUs
Pascal	NVIDIA P400	See Addressing Limitations on Older NVIDIA GPUs

Addressing Limitations on Older NVIDIA GPUs¶

NVIDIA GPUs before the Ada Generation have DMA addressing limitations that can cause issues during lending. These GPU requires one of the following workarounds:

Turn on IOMMU on the Lender. The IOMMU is used to remap the high addresses to lower virtual addresses the GPU can address. See IOMMU / VT-d.
Set BIOS setting MMIOH to 1TB or lower on the Lender. This forces the NTB’s BAR address to be lower allowing the GPU to address it. This may limit the maximum system memory. Please refer to your system manual.
Disable 4G decoding in BIOS of the Lender. This greatly limits the BAR sizes supported by the system and is not recommended.

SR-IOV¶

SmartIO supports lending both SR-IOV Physical Functions as well as individual Virtual Functions. The Physical Function cannot be lended when SR-IOV is enabled / virtual functions have been instantiated. Some SR-IOV Virtual Functions may not work when borrowed because the VF driver expect to run in a Virtual Machine.

Device	Lending Physical Function	Lending Virtual Function
Mellanox / NVIDIA ConnectX-5	Supported	Supported
Samsung PM1725a	Supported	Supported