RocksDB

This page demonstrates how NVMe storage performs across a local PCIe-attached device and a remote device using SmartIO Device Lending. Our results show that RocksDB, along with the YCSB workload generator and a remote NVMe device, performs within 3% of using a local NVMe device.

Installation

Follow the instructions on https://github.com/facebook/rocksdb/ and https://github.com/brianfrankcooper/YCSB/.

We use RocksDB v10.10.1 and YCSB v0.17.0.

Benchmarking setup

We use our 4-node server-grade experimental setup with the borrower on node A and the lender on node B.


../../../_images/milanq-nvme-topology.svg

Group

Category

Details

System

Topology

4 nodes named A, B, C, D

CPU

Dual-socket AMD EPYC 7763 64-Core

Motherboard

Supermicro H12DSU-iN

Model name

Supermicro AS -2024US-TRT

Memory

16x 128GiB DDR4 DIMMs at 3.2 GHz

Operating system

Ubuntu 22.04-hwe

Kernel

Linux 6.8

PCIe

Host adapter cards

MXH930 PCIe 4.0 NTB Host Adapter

Switch

MXS924 PCIe 4.0 Switch

PCIe cables

PCIe 4.0 SFF-8644 Cables

Driver version

5.26 (estimated release later in 2026)

GPUs

NVIDIA GPUs

2x A100 40GB (node A and B)

AMD GPUs

2x AMD Instinct MI210 (node C and D)

NVIDIA driver version

590

CUDA toolkit version

11.8

NCCL version

2.28.9-1

nccl-tests version

2.16.5

Storage

Storage

PM1733 Enterprise NVMe PCIe SSD (1.92 TB)

fio version

3.41

SPDK version

26.01

RDMA

RDMA NIC

Mellanox ConnectX-6 200Gbps using InfiniBand

RDMA switch

NVIDIA QM8700

OFED version

24.10

Loading RocksDB with a base DB image

First, we set up XFS on the NVMe and mount it.

# mkfs.xfs -f -m reflink=0 -l size=256m ${PARTITION}
# mount -t xfs -o noatime,nodiratime,logbufs=8,logbsize=256k ${PARTITION} ${MOUNTPOINT}

Then we load a 50M-element database onto the NVMe. This corresponds to a database size of about 512 GB. We load this DB image before every workload execution, so each execution starts with the same image.

# numactl --cpunodebind=1 --membind=1 ycsb/bin/ycsb.sh load rocksdb \
    -s \
    -P load \
    -p rocksdb.dir=/mnt/nvme1/db.bak \
    -p rocksdb.max_background_compactions=4 \
    -p rocksdb.max_background_flushes=2 \
    -threads 32

The YCSB workload file load is:

recordcount=50000000
operationcount=50000000
fieldcount=10
fieldlength=1000

insertproportion=1.0
readproportion=0
updateproportion=0
scanproportion=0

requestdistribution=uniform
threadcount=32

workload=site.ycsb.workloads.CoreWorkload

Running the YCSB workloads with RocksDB

We run the YCSB workloads A through F for 180 seconds each, repeat the execution 10 times, and extract the median ops/sec. We flush the page cache before every execution. Each execution starts with the same, fresh database image. We run the experiment on a local NVMe and then on a different node using the same NVMe with SmartIO. Before running the experiment, we disable hyperthreading, swap space, and irqbalance, and set the CPUs in performance mode.

We execute each workload like this:

# numactl --cpunodebind=1 --membind=1 ycsb/bin/ycsb.sh run rocksdb \
    -s \
    -P a \
    -p rocksdb.dir=/mnt/nvme6/db \
    -p rocksdb.max_background_compactions=4 \
    -p rocksdb.max_background_flushes=2 \
    -threads 32 \
    -p maxexecutiontime=180

Results

YCSB workload

Local NVMe (ops/sec)

Remote NVMe (ops/sec)

Ratio (remote/local)

A (50% reads, 50% updates)

17,525

17,116

97,66%

B (95% reads, 5% updates)

133,132

131,791

98,99%

C (100% reads)

445,952

432,597

97,01%

D (95% reads, 5% inserts)

512,321

510,471

99,63%

E (95% scans, 5% insert)

16,327

15,964

97,78%

F (50% reads, 50% read-modify-writes)

16,470

16,466

99,98%

YCSB workload files

A

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0

requestdistribution=zipfian

B

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0

requestdistribution=zipfian

C

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0

requestdistribution=zipfian

D

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05

requestdistribution=latest

E

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

F

recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5

requestdistribution=zipfian

References