Device lending

../_images/aafig-5ed04f24af90a137b60da2e954fe67f4856abfe5.svg

Device lending builds on top of SmartIO and allows devices to be borrowed and inserted into the local kernel device tree. This will load the (transparent) device driver for the device and signal a hot-add event.

Device lending can for example be used to give a computer additional GPU power by borrowing additional GPUs from other nodes in the cluster.

The Dolphin Device Lending software enables this process to be controlled using a set of command line tools and options. These tools can be used directly or integrated into any other higher level resource management system. The device lending software is very flexible and does not require any boot order or power on sequencing. PCIe devices borrowed from a remote system can be used as if they were local devices until they are given back. The Device Lending software does not require any changes to transparent devices or to the Linux kernel.

Devices can be made available to systems on the network and can be temporarily borrowed by any system as long as it is required. When use of the device is completed, the device can be borrowed by other systems on the network or it can be returned to local use where it is physically located.

Borrowing and returning remote devices

This section describes how to use device lending to borrow remote devices from other nodes. Borrowed devices appears to the borrowing system as local, hot-plugged devices. The device must be unused for borrow to succeed (‘available’ in list). A device that is borrowed for device lending may not be be borrowed by any other node or SISCI application at the same time.

Borrowing remote devices is performed using smartio_tool borrow. This command needs the ID of the remote device. This ID can be found using smartio_tool list (see previous section). The borrow command also takes a second optional parameter that specifies the DMA window size. The DMA window size controls the amount of memory a device driver can expose to a borrowed device at any given time. The DMA window consumes the lending side’s mapping resources which is limited by the lending side node’s prefetch space (BAR2). If the window size is not specified, the default value will be used. The default may be too small to properly work with some devices like GPUs. It’s recommended that the lender’s prefetch size is increased to multiple GBs, for example 32GB so that the window size can be set to a sufficiently large value. If the window runs out of space at any point, a warning message will be printed to the kernel logs: ‘No room in IOMMU range’:

$ smartio_tool borrow 80000 512
Name: Non-Volatile memory controller Intel Corporation Device f1a5
Local users: 1
Local virtual device: 0000:04:05.0
Bound to driver: nvme
NVMe namespace: nvme0

The command returns once the node has been granted temporary ownership of the device, but depending on the driver, there may be some additional time before the device is ready for use. If the command is successful it will print out some information about the newly borrowed device, for instance it’s corresponding local virtual device and the driver that has taken ownership of the device:

$ ls /dev/nvme0*
/dev/nvme0  /dev/nvme0n1  /dev/nvme0n1p1
$ mount /dev/nvme0n1p1 /mnt

Before returning a device it’s recommended that any local use of the device is stopped in a clean manner. For disk drives, you should unmount any mounted partition on the drive to be returned. This mirrors the preparation that must be made before a device is set as available:

$ umount /mnt
$ smartio_tool return 80000