System Requirements¶
This page describes the system requirements of SmartIO. The exact requirements depend on the SmartIO functionality used on a specific node. We recommend that you refer to the “System Requirements” sections in the Quick Start Guides or Adding Devices to the Pool or Borrowing Pooled Devices.
Platform Requirements¶
SmartIO is supported on most x86_64 systems as well as selected ARM64 platforms [1].
Warning
Lending out local devices is not yet supported on Intel Xeon Ice Lake and newer. Please contact support for advice.
Platform |
Lending |
Borrowing |
SISCI |
|---|---|---|---|
AMD Ryzen |
Supported |
Supported |
Supported |
AMD EPYC |
Limited, see No irq handler for vector in console |
Supported |
Supported |
AMD Threadripper |
Supported |
Supported |
Supported |
Intel Core series |
Peer-to-peer support limited, see PCIe peer-to-peer (P2P) support |
Supported |
Supported |
Intel Xeon Cooper Lake and older |
Supported |
Supported |
Supported |
Intel Xeon Ice Lake |
To be supported in future release |
Supported |
Supported |
Intel Xeon Rocket Lake, Sapphire Rapids, Emerald Rapids and Granite Rapids |
To be supported in future release |
Supported |
Supported |
NVIDIA Xavier |
Peer-to-peer support limited, see PCIe peer-to-peer (P2P) support |
Supported (in 5.24) |
Supported |
Hint
Server platforms like Intel Xeon and AMD EPYC have more available PCIe lanes than their desktop counterparts. This is important to achieve the full performance of PCIe devices.
IOMMU / VT-d¶
It’s generally recommended to enable IOMMU / VT-d on all nodes when using SmartIO. Enabling IOMMU eases the requirements for Host NTB Adapter Prefetchable Size by allowing a smaller DMA window. The IOMMU isolates the devices from the rest of the system and the other nodes in the fabric. This can protect against bugs in the target device driver. It can also be very useful when a device is used with the SISCI API. On the lending side, the target device will only be granted access to designated DMA regions on the borrow side. On the borrow side, these DMA regions are mapped as requested by the target device driver while the rest of the region is protected. If the target device misbehaves and tries to DMA to a protected region, the system kernel will typically log this.
When the IOMMU is enabled all traffic is forced to go through the CPU. This can re-route peer-to-peer traffic between the NTB adapter and a lent device. When the NTB adapter and the lent device is directly connected to CPU-provided PCIe lanes, this causes only a minor performance penalty. Systems with on-board PCIe switches or using a PCIe expansion chassis can get a substantial performance penalty. In these cases disabling the IOMMU on the lender may be advisable.
ACS and IOMMU groups¶
ACS (Access Control Services) is a PCIe feature that is enabled to enforce isolation when IOMMU is enabled. The Linux kernel will group devices that cannot be isolated from each other into “IOMMU groups”. To lend a device in an IOMMU group, all of the other devices in the same group must also be added and made available. This prevents those other devices from being used locally while one or more of the devices in the group is borrowed. This is normally not a problem on most server and workstation systems. Systems with on-board PCIe switches or expansion chassis, or devices attached to the chipset can have IOMMU group issues.
Hint
On desktop platforms like Intel Core and AMD Ryzen it’s common for some of the PCIe slots to have lanes attached to the chipset. These slots are more likely to have ACS related issues.
NUMA and Multi-Socket Systems¶
Performance can be significantly impacted when peer-to-peer traffic between the NTB adapter and the lent device needs to cross between CPU sockets. The same effect can be seen on some single CPU systems like AMD EPYC [2]. To achieve the optimal performance it’s recommended that the NTB adapter and the devices to be lent are on the same NUMA node or I/O Die quadrant. Please refer to your system or motherboard manual for more details.
Hint
On NUMA systems, the NUMA node of a PCIe device can be discovered with lspci.
On AMD EPYC systems the NPS=4 BIOS setting exposes each CPU as 4 NUMA nodes. This can make is easier to discover which PCIe slots are associated with the same I/O Die / NUMA node. This setting can affect performance including memory bandwidth so you may want to set NPS back to the default value after determining the optimal PCIe slots. Refer to your systems manual for more details.
PCIe peer-to-peer (P2P) support¶
The lending side must support PCIe peer-to-peer transactions between the slot where the Dolphin PCIe adapter and the slot where the target device is installed. Peer-to-peer support is not needed to borrow remote devices. In our experience, most AMD and Intel Xeon systems support peer-to-peer. We recommend that you ask your system vendor if your system supports peer-to-peer. If your system has PCIe switches or you have a topology with transparent switches, peer-to-peer traffic can take the shortest path between two devices, avoiding going via the CPU/root complex. The shortest path routing will only happen if the IOMMU is disabled.
On platforms where peer-to-peer is not supported, but there is a direct path between the NTB and the target device via one or more PCIe switches, lending out that device can still work. This requires that the IOMMU is disabled.
Hint
In some cases, it’s possible to have partial support for p2p in a system. This can for instance be the case in systems with an internal PCIe switch or when using an expansion box. In these cases, note that enabling the IOMMU will force p2p transactions to go through the CPU. Enabling IOMMU requires the CPU to support p2p. This may cause p2p to work only when the IOMMU is off.
NTB cluster Requirements¶
Supported Host Adapters¶
Dolphin PXH Host Adapter |
Status |
Notes |
|---|---|---|
PXH840 |
Supported |
|
PXH830 |
Supported |
|
PXH810 |
Partially supported |
Fabric Attached Devices are not supported. |
PXH820 / PXH824 |
Partially supported |
Fabric Attached Devices are not supported. |
Host NTB Adapter Prefetchable Size¶
It’s recommended to configure a large NTB prefetchable memory on the NTB host
adapter when using SmartIO. The exact size required depends on the device and
system configuration. While the default size of 256MiB may be enough for some
cases, it’s recommended that the user increases the prefetchable memory if
possible. It’s recommended to configure all nodes to have at least
32GiB of prefetchable memory.
Warning
Make sure 4G decoding is enabled in the BIOS / Firmware on your system before trying to increasing the prefetchable size. Increasing the prefetchable size with 4G decoding disabled can cause the system to fail to boot.
When borrowing a device, the borrower’s NTB adapter must map all the BARs of the
device. Taking into account alignment requirements and additional space
required for general communication, the NTB prefetch size must be larger than
the combined size of all the BARs of all the devices that will be borrowed
simultaneously. For instance, borrowing a device with one 8GiB BAR2 requires at
least 8GiB mapping space, but due to the additional mapping space used for
communication as well as alignment requirements, a prefetchable memory size of
8GiB is insufficient. Since the NTB prefetchable size can only be set in
powers of two, the next step up is 16GiB which is enough for one such device,
but not enough for 2 devices with 8GiB BARs.
NTB prefetchable memory space is also used on the lender side when a local device is borrowed by another node. The exact size depends on the DMA window size used by the borrower.
Setting the Adapter Prefetchable Size¶
The prefetchable size is set with physical DIP switches on the MXH adapter. Please refer to the User Guide of your NTB Adapter for specific instructions. For MXS switches with D-Switch topology the prefetch size is set in the switch web interface. Please refer to the MXS User Guide for instructions.
The prefetchable size can be set using the dis_config tool. Refer to
Configuring the adapter card for instructions.
The prefetchable size can be set using the dis_config tool. Refer to
Configuring the adapter card for instructions.
Operating System Requirements¶
Operating system |
Lending |
Borrowing |
SISCI |
Notes |
|---|---|---|---|---|
RHEL 10 (AlmaLinux 10, Rockylinux 10, CentOS 10 Stream) |
Supported |
Supported |
Supported |
|
RHEL 9 (AlmaLinux 9, Rockylinux 9, CentOS 9 Stream) |
Supported |
Supported |
Supported |
MXH500-series supported since kernel-5.14.0-406.el9 [3] |
RHEL 8 (AlmaLinux 8, Rockylinux 8, CentOS 8 Stream) |
Supported |
Supported |
Supported |
MXH500-series not supported [3] |
Ubuntu 24.04 LTS |
Supported |
Supported |
Supported |
|
Ubuntu 22.04 LTS with Hardware Enablement (HWE) stack |
Supported |
Supported |
Supported |
|
Ubuntu 22.04 LTS |
Supported |
Supported |
Supported |
MXH500-series not supported [3] |
Windows |
Preview [4] |
Not supported |
Supported |
Linux Kernel¶
eXpressWare and SmartIO is designed to work with a wide range of kernel versions, but eXpressWare is only tested and qualified with the kernels shipped by the distributions we support. Please refer to the table above. Contact support for additional information or support.
Supported Devices¶
SmartIO is designed to work with all PCIe compliant devices and device drivers by implementing support for all the required PCIe features. Some legacy PCI features are not supported, but this does not impact most (if any) PCIe devices. Verified to be working device types include:
NVMe drives from multiple vendors
NVIDIA GPUs from multiple generations
Intel Ethernet adapters
Mellanox/NVIDIA ConnectX network adapters
Various FPGAs
In general, almost all PCIe devices are supported with SmartIO and device lending. More specifically SmartIO supports the following features:
Feature |
Support Status |
|---|---|
Memory Space Device Registers (BARs), prefetchable and non-prefetchable. |
Supported |
DMA to/from Device to RAM (“Zero-Copy”) |
Supported |
MSI Interrupts |
Supported |
MSI-X Interrupts |
Supported |
Peer-to-peer |
Supported. See PCIe peer-to-peer |
SR-IOV |
Supported. See Lending Virtual Function of an SR-IOV device |
Configuration Space |
Supported |
Legacy Pin-based interrupts (INTx). |
Not supported [5] |
IO Space Device Registers (BARs) |
Not supported [6] |
NVIDIA GPU¶
NVIDIA GPUs are fully functional including support for CUDA, Unified Memory, peer-to-peer and graphical applications. The following list specifies the verified NVIDIA GPU architectures:
NVIDIA GPU Architecture |
Tested on |
Notes |
|---|---|---|
Blackwell |
GeForce RTX 5070, RTX 4050 PRO |
|
Ada Lovelace |
NVIDIA RTX 2000 Ada, GeForce RTX 4070 |
|
Hopper |
Untested |
|
Ampere |
RTX A4500 |
|
Volta |
Tesla V100 |
|
Touring |
NVIDIA T600 |
|
Pascal |
NVIDIA P400 |
Addressing Limitations on Older NVIDIA GPUs¶
NVIDIA GPUs before the Ada Generation have DMA addressing limitations that can cause issues during lending. These GPU requires one of the following workarounds:
Turn on IOMMU on the Lender. The IOMMU is used to remap the high addresses to lower virtual addresses the GPU can address. See IOMMU / VT-d.
Set BIOS setting MMIOH to
1TBor lower on the Lender. This forces the NTB’s BAR address to be lower allowing the GPU to address it. This may limit the maximum system memory. Please refer to your system manual.Disable 4G decoding in BIOS of the Lender. This greatly limits the BAR sizes supported by the system and is not recommended.
SR-IOV¶
SmartIO supports lending both SR-IOV Physical Functions as well as individual Virtual Functions. The Physical Function cannot be lended when SR-IOV is enabled / virtual functions have been instantiated. Some SR-IOV Virtual Functions may not work when borrowed because the VF driver expect to run in a Virtual Machine.
Device |
Lending Physical Function |
Lending Virtual Function |
|---|---|---|
Mellanox / NVIDIA ConnectX-5 |
Supported |
Supported |
Samsung PM1725a |
Supported |
Supported |