System Requirements¶
The system requirements and optimal configurations are different for the lender side and the borrowing side. The system requirements also depends on the target device as well as the required functionality. Nodes that act as both lenders and borrowers may need to make certain trade-offs. Each section below will detail the requirements for the different use cases.
Supported Dolphin Hardware and Software¶
All nodes that should use SmartIO must run the same version of the Dolphin eXpressWare driver and have the SmartIO Module installed. SmartIO is supported on the following adapter cards:
PXH810
PXH830
PXH840
MXH830
General System Requirements¶
The CPU and Memory of the lender will only be used for initialization and will not be active / used when the borrowing system is using the PCIe devices. All PCIe transactions and system interrupts will be forwarded to the borrowing side by the PCIe hardware. Keep in mind that the system requirements for the target device will apply to all borrowers.
Linux Kernel and Distribution¶
SmartIO is validated on RHEL 7 / CentOS 7 and Ubuntu 18.04 LTS. Other distributions are likely to work as long as the kernel is 3.10 or newer.
Some features depends on the kernel version of the borrowing side:
Kernel version |
features |
---|---|
3.10+ |
Base functionality. |
4.9+ |
Peer to peer transfers |
4.10+ |
VFIO / virtualization support (upcoming release) |
Adapter Prefetchable memory size¶
SmartIO requires a non-trivial amount of prefetchable memory configured
for the NTB adapter. While the default size of 256MiB may be enough for some
cases, it’s recommended that the user increases the prefetchable memory if
possible. Using a large prefetchable memory has no inherent downsides (and
multiple upsides), but some computers may fail to allocate the configured size.
It’s strongly recommended to configure all nodes to have at least 32GiB
of prefetchable memory. Please see Adapter Prefetchable Memory Size.
Role |
Recommended Size |
Minimum Size |
---|---|---|
Lending NVMe drives |
≥ 32GiB |
256MiB |
Borrowing NVMe drives |
≥ 32GiB |
256MiB |
Borrowing 1 GPU |
≥ 32GiB |
512MiB |
Lending 1 GPU |
≥ 32GiB |
1024MiB |
Lender¶
The lender must be able to map the entirety of the DMA window size specified by the user, or the default automatically selected. The DMA window limits the amount of RAM the target driver can expose to the target device at any given moment.
Warning
An insufficiently sized DMA window may errors or performance issues. In the
event that the DMA is filled up, SmartIO will log a warning to dmesg
:
No room in IOMMU range for map size
The DMA window size is limited by the lender’s prefetchable memory. These mapping resources are consumed until the device is returned. If the borrower does not have IOMMU enabled, the lender must be able to map all of the borrower’s RAM. See System IOMMU / Vt-d.
Borrower¶
The borrower must be able to map all BARs of the target device. This mapping is subject to alignment restrictions and may require more than the sum of the size of all BARs of the target device. This is limited by the borrower’s prefetchable memory. These mapping resources are consumed until the device is returned. When multiple devices are borrowed at the same time, all of theirs BARs must be mappable at the same time.
Warning
Failure to map all bars causes the borrow command to fail.
System IOMMU / Vt-d¶
For device lending it’s recommended that the IOMMU is enabled on all nodes. However, since the IOMMU forces all P2P transactions to go through the CPU, disabling the IOMMU on the lending side may have improve performance. It’s recommended that the IOMMU is on initially and only disabled after device lending with IOMMU enabled is confirmed to be working. See Enabling and Disabling IOMMU / VT-d.
Role |
Recommended IOMMU setting |
---|---|
Lender |
Turn on. Consider turning off after reading PCIe topology considerations |
Borrower |
Turn on. |
Both |
Turn on. |
Hint
The IOMMU isolates the devices from the rest of the system and the other nodes in the fabric. This can protect against bugs in the target device driver. It can also be very useful when a device is used with the SISCI SmartIO API. On the lending side, the target device will only be granted access to designated DMA regions on the borrow side. On the borrow side, these DMA regions are mapped as requested by the target device driver while the rest of the region is protected. If the target device misbehaves and tries to DMA to a protected region, the system kernel will typically log this.
Warning
The AMD IOMMU is currently not supported for device lending on the borrowing side and must be disabled.
Borrower¶
It’s recommended to turn IOMMU ON on the borrowers. It’s possible to borrow devices with the IOMMU turned off, but that requires the lenders to map all of the RAM of the borrower. See Lender
Lender¶
It’s recommended to turn IOMMU on on the lender. Enabling it can however have adverse affects on IO performance in some topologies as all PCIe traffic must be routed through the CPU. As a performance optimization the user may consider turning IOMMU off on the lender. See PCIe topology considerations.
PCIe peer to peer support¶
Role |
P2P support required |
---|---|
Lender |
Required |
Borrower |
Not required |
The lending side must support PCIe peer-to-peer transactions between the slot where the Dolphin PCIe adapter and the slot where the target device is installed. On most systems this is a feature controlled by the motherboard or BIOS vendor.
Warning
It is strongly recommended to ask your system vendor to confirm this is supported before ordering a new system. There is currently no known way to determine if the computer supports PCIe peer to peer transactions, except by testing. If this test fails, you have to find another computer.
In some cases, it’s possible to have partial support for P2P in a system. This can for instance be the case in systems with an internal PCIe switch or when using an expansion box. In these cases note that enabling the IOMMU will force P2P transactions to go through the CPU. Enabling IOMMU requires the CPU to support P2P. This may cause P2P to work only when the IOMMU is off.
See PCIe topology considerations for a more in-depth explanation of how the PCIe topology affects P2P.
Target Device Requirements¶
In general, almost all PCIe devices are supported with SmartIO and device lending. More specifically SmartIO supports the following features:
Configuration Space
Memory Space BARs (prefetchable and non-prefetchable)
MSI interrupts
MSI-X interrupts
SR-IOV
These legacy PCI features are not supported:
IO port BARs
INTX interrupts (May be implemented later)
Legacy VGA
The supported features covers almost all modern devices. This includes:
NVMe drives
NVIDIA GPUs (GeForce, Quadro, Tesla)
Intel Ethernet adapters
NVIDIA GPUs¶
NVIDIA GPUs have limited addressing capabilities (37 or 40 bits, while Pascal has 47 bits). This causes issues when IOMMU is disabled and 4G decoding is enabled as the GPU may be unable to address the DMA windows mapped through the NTB. This issue can be avoided by either disabling 4G decoding or by enabling the IOMMU on the lender side.
PCIe topology considerations¶
The PCIe topology in a system is an important consideration when doing P2P such as with SmartIO. Unless the IOMMU is enabled, traffic between two devices in a PCIe fabric will take the shortest path. Thus, considering topology must be done when a system is built and when considering to enable the IOMMU. Latency for a given path is mostly determined by the number of PCIe switches or ‘hops’ in the path. Note that the NTB adapter contains a PCIe switch.
For the lending side the most important consideration is the path between the target device and the NTB adapter.
For the borrow side the path to consider is the path between the NTB and the CPU/RAM.
When the target device is in an expansion box, it might be prudent to also have the NTB adapter in the expansion box. This would allow the traffic to go directly from the target device to the NTB adapter, only crossing the internal PCIe switch in the expansion box itself. If the adapter is in the host computer, the traffic would also need to go across the transparent uplink to the expansion box which might include additional PCIe switch chips. Note however that putting the NTB adapter in the expansion box with the IOMMU enabled would be counter-productive as the IOMMU would redirect all traffic through the CPU.
This figure contains target devices both directly attached to the CPU as well as PCIe switches. Borrowing the GPU in System A (to B) will be more efficient than the GPU in System B (to A).¶