Multi-host sharing of an Intel Arc Pro GPU with SR-IOV

SR-IOV can be used to split a physical device into multiple virtual devices. Device Lending allows these virtual functions to be lent out and borrowed individually. This allows you to for example share a single GPU with all nodes in a cluster. This page shows you how to share an Intel Arc Pro GPU using SR-IOV.

../../_images/smartio-intel-arc.svg

System Requirements

Ensure your Intel Arc GPU is running SR-IOV enabled firmware by looking for the SR-IOV capability in lspci. If you don’t find the capability, the GPU likely needs a firmware upgrade.

# lspci -s 4a: -v
4a:00.0 VGA compatible controller: Intel Corporation Device e212 (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device 1114
        Flags: bus master, fast devsel, latency 0, IRQ 264, IOMMU group 55
        Memory at 20a0c000000 (64-bit, prefetchable) [size=16M]
        Memory at 20000000000 (64-bit, prefetchable) [size=16G]
        Expansion ROM at 84a00000 [disabled] [size=2M]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
        Capabilities: [d0] Power Management version 3
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [110] Null
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [420] Physical Resizable BAR
        Capabilities: [220] Virtual Resizable BAR
        Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [400] Latency Tolerance Reporting
        Kernel driver in use: xe
        Kernel modules: xe

eXpressWare Installation

When installing eXpressWare, make sure to request installation of SmartIO, either interactively or by passing the --enable-smartio argument. Please refer to the installation guide for more details.

Creating the Virtual Functions

The virtual functions are created at runtime by writing the number of desired VFs to a file in sysfs. After creating the VFs, the virtual functions will show up in lspci as additional functions on the same bus as the physical function (PF).

# lspci -s 4a:
4a:00.0 VGA compatible controller: Intel Corporation Device e212

# echo 5 > /sys/bus/pci/devices/0000\:4a\:00.0/sriov_numvfs

# lspci -s 4a:
4a:00.0 VGA compatible controller: Intel Corporation Device e212
4a:00.1 VGA compatible controller: Intel Corporation Device e212
4a:00.2 VGA compatible controller: Intel Corporation Device e212
4a:00.3 VGA compatible controller: Intel Corporation Device e212
4a:00.4 VGA compatible controller: Intel Corporation Device e212
4a:00.5 VGA compatible controller: Intel Corporation Device e212

Hint

sudo is not enough to allow you to write to this file. Either use sudo -i to get a full root shell, or run: echo 5 | sudo tee /sys/bus/pci/devices/0000\:4a\:00.0/sriov_numvf

Lending the Virtual Functions to the Pool

The virtual function can be made available for other nodes in the cluster like any other PCIe device. Note that you cannot lend the physical function (PF) while SR-IOV is enabled. The devices that are going to be shared must be added and made available with smartio_tool add and smartio_tool available. The lender must also be connected to all the borrowers with smartio_tool connect. See Lending Local Devices for more details.

# smartio_tool add 4a:00.1
# smartio_tool add 4a:00.2
# smartio_tool add 4a:00.3
# smartio_tool add 4a:00.4
# smartio_tool add 4a:00.5

# smartio_tool available --unbind 4a:00.1
# smartio_tool available --unbind 4a:00.2
# smartio_tool available --unbind 4a:00.3
# smartio_tool available --unbind 4a:00.4
# smartio_tool available --unbind 4a:00.5

Borrowing devices from the Pool

Devices in the pool can be borrowed by nodes to be used like a local device. You can list the available devices with smartio_tool list and then borrow a device with smartio_tool borrow:

# smartio_tool list
44a01: VGA compatible controller Intel Corporation Device e212 [available]
44a02: VGA compatible controller Intel Corporation Device e212 [available]
44a03: VGA compatible controller Intel Corporation Device e212 [available]
44a04: VGA compatible controller Intel Corporation Device e212 [available]
44a05: VGA compatible controller Intel Corporation Device e212 [available]

# smartio_tool borrow 44a01
Name: VGA compatible controller Intel Corporation Device e212
Available: in use
Location: remote
Adapter: 0
NodeId: 4
Remote BDF: 0000:4a:00.1
Physical slot: N/A
Serial Number: 00-00-00-00-00-00-00-00
UUID: 0710779e-9564-46c0-9954-3677413dee79
Vendor ID: 8086
Device ID: e212
Subsystem Vendor ID: 8086
Subsystem Device ID: 1114
Local users: 1
Local virtual device: 0000:05:02.3
Bound to driver: xe

See Using Native Device Drivers for more details.