RocksDB¶
This page demonstrates how NVMe storage performs across a local PCIe-attached device and a remote device using SmartIO Device Lending. Our results show that RocksDB, along with the YCSB workload generator and a remote NVMe device, performs within 3% of using a local NVMe device.
Installation¶
Follow the instructions on https://github.com/facebook/rocksdb/ and https://github.com/brianfrankcooper/YCSB/.
We use RocksDB v10.10.1 and YCSB v0.17.0.
Benchmarking setup¶
We use our 4-node server-grade experimental setup with the borrower on node A and the lender on node B.
Group |
Category |
Details |
|---|---|---|
System |
Topology |
4 nodes named A, B, C, D |
CPU |
Dual-socket AMD EPYC 7763 64-Core |
|
Motherboard |
Supermicro H12DSU-iN |
|
Model name |
Supermicro AS -2024US-TRT |
|
Memory |
16x 128GiB DDR4 DIMMs at 3.2 GHz |
|
Operating system |
Ubuntu 22.04-hwe |
|
Kernel |
Linux 6.8 |
|
PCIe |
Host adapter cards |
|
Switch |
||
PCIe cables |
||
Driver version |
5.26 (estimated release later in 2026) |
|
GPUs |
NVIDIA GPUs |
2x A100 40GB (node A and B) |
AMD GPUs |
2x AMD Instinct MI210 (node C and D) |
|
NVIDIA driver version |
590 |
|
CUDA toolkit version |
11.8 |
|
NCCL version |
2.28.9-1 |
|
nccl-tests version |
2.16.5 |
|
Storage |
Storage |
PM1733 Enterprise NVMe PCIe SSD (1.92 TB) |
fio version |
3.41 |
|
SPDK version |
26.01 |
|
RDMA |
RDMA NIC |
Mellanox ConnectX-6 200Gbps using InfiniBand |
RDMA switch |
NVIDIA QM8700 |
|
OFED version |
24.10 |
Loading RocksDB with a base DB image¶
First, we set up XFS on the NVMe and mount it.
# mkfs.xfs -f -m reflink=0 -l size=256m ${PARTITION}
# mount -t xfs -o noatime,nodiratime,logbufs=8,logbsize=256k ${PARTITION} ${MOUNTPOINT}
Then we load a 50M-element database onto the NVMe. This corresponds to a database size of about 512 GB. We load this DB image before every workload execution, so each execution starts with the same image.
# numactl --cpunodebind=1 --membind=1 ycsb/bin/ycsb.sh load rocksdb \
-s \
-P load \
-p rocksdb.dir=/mnt/nvme1/db.bak \
-p rocksdb.max_background_compactions=4 \
-p rocksdb.max_background_flushes=2 \
-threads 32
The YCSB workload file load is:
recordcount=50000000
operationcount=50000000
fieldcount=10
fieldlength=1000
insertproportion=1.0
readproportion=0
updateproportion=0
scanproportion=0
requestdistribution=uniform
threadcount=32
workload=site.ycsb.workloads.CoreWorkload
Running the YCSB workloads with RocksDB¶
We run the YCSB workloads A through F for 180 seconds each, repeat the execution 10 times, and extract the median ops/sec. We flush the page cache before every execution. Each execution starts with the same, fresh database image. We run the experiment on a local NVMe and then on a different node using the same NVMe with SmartIO. Before running the experiment, we disable hyperthreading, swap space, and irqbalance, and set the CPUs in performance mode.
We execute each workload like this:
# numactl --cpunodebind=1 --membind=1 ycsb/bin/ycsb.sh run rocksdb \
-s \
-P a \
-p rocksdb.dir=/mnt/nvme6/db \
-p rocksdb.max_background_compactions=4 \
-p rocksdb.max_background_flushes=2 \
-threads 32 \
-p maxexecutiontime=180
Results¶
YCSB workload |
Local NVMe (ops/sec) |
Remote NVMe (ops/sec) |
Ratio (remote/local) |
|---|---|---|---|
A (50% reads, 50% updates) |
17,525 |
17,116 |
97,66% |
B (95% reads, 5% updates) |
133,132 |
131,791 |
98,99% |
C (100% reads) |
445,952 |
432,597 |
97,01% |
D (95% reads, 5% inserts) |
512,321 |
510,471 |
99,63% |
E (95% scans, 5% insert) |
16,327 |
15,964 |
97,78% |
F (50% reads, 50% read-modify-writes) |
16,470 |
16,466 |
99,98% |
YCSB workload files¶
A¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
requestdistribution=zipfian
B¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0
requestdistribution=zipfian
C¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian
D¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05
requestdistribution=latest
E¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100
scanlengthdistribution=uniform
F¶
recordcount=50000000
operationcount=1000000000
workload=site.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5
requestdistribution=zipfian