Benchmarks

Benchmark	Host	QEMU	Cartesi Machine	Benchmark command
CPU registers	1	1.18 ± 0.20	5.30 ± 0.13	stress-ng --regs 1 --regs-ops 10000
Zlib compression	1	4.16 ± 0.10	7.83 ± 0.10	stress-ng --zlib 1 --zlib-ops 20
Forking	1	8.28 ± 0.22	9.36 ± 0.23	stress-ng --fork 1 --fork-ops 1000
Naive loop	1	4.15 ± 0.15	11.26 ± 0.39	stress-ng --cpu 1 --cpu-method loop --cpu-ops 400
Memory read/write	1	5.36 ± 0.38	12.35 ± 0.86	stress-ng --memrate 1 --memrate-bytes 2M --memrate-ops 200
Heapsort	1	7.14 ± 0.11	12.62 ± 0.31	stress-ng --heapsort 1 --heapsort-ops 3
Fibonacci	1	4.97 ± 0.07	15.32 ± 0.17	stress-ng --cpu 1 --cpu-method fibonacci --cpu-ops 400
Linux syscalls	1	1.59 ± 0.15	15.43 ± 1.41	stress-ng --syscall 1 --syscall-ops 4000
Checksum hashes	1	7.21 ± 0.17	15.81 ± 0.40	stress-ng --hash 1 --hash-ops 40000
Cache thrashing	1	10.36 ± 2.20	15.83 ± 1.06	stress-ng --cache 1 --cache-ops 100000
Disk writes	1	8.23 ± 0.44	17.36 ± 0.95	stress-ng --hdd 1 --hdd-ops 2000
Quicksort	1	10.74 ± 0.22	17.78 ± 0.36	stress-ng --qsort 1 --qsort-ops 5
TLB shootdowns	1	11.16 ± 0.90	18.12 ± 1.08	stress-ng --tlb-shootdown 1 --tlb-shootdown-ops 2000
Memory allocation	1	15.65 ± 0.89	19.65 ± 1.12	stress-ng --malloc 1 --malloc-ops 40000
Integer arithmetic	1	13.85 ± 0.47	25.60 ± 0.86	stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400
Memory copy	1	11.57 ± 0.40	32.60 ± 5.21	stress-ng --memcpy 1 --memcpy-ops 50
Instruction cache thrashing	1	26.68 ± 0.72	36.64 ± 1.00	stress-ng --icache 1 --icache-ops 200
SHA-256	1	19.23 ± 1.21	37.95 ± 2.22	stress-ng --crypt 1 --crypt-method SHA-256 --crypt-ops 400000
Floating-point math	1	24.18 ± 0.87	41.85 ± 1.59	stress-ng --fp 1 --fp-method floatadd --fp-ops 1000
Floating-point vector math	1	25.47 ± 2.11	53.08 ± 4.37	stress-ng --vecfp 1 --vecfp-ops 100
Floating-point matrix multiplication	1	33.66 ± 3.21	62.17 ± 5.92	stress-ng --matrix 1 --matrix-method mult --matrix-ops 20000
Floating-point fused multiply add	1	30.16 ± 2.16	63.73 ± 4.54	stress-ng --fma 1 --fma-ops 40000
Floating-point trigonometric math	1	38.37 ± 4.28	81.17 ± 9.05	stress-ng --trig 1 --trig-ops 50
Integer vector arithmetic	1	18.95 ± 1.51	112.91 ± 8.81	stress-ng --vecmath 1 --vecmath-ops 100
Floating-point square root	1	62.32 ± 9.64	179.88 ± 27.6	stress-ng --cpu 1 --cpu-method sqrt --cpu-ops 20

How to read: All numbers are relative speed to the same benchmark run on the host, for example 5.30 + 0.13 means the benchmark on the host was 5.30 times faster than in the guest virtual machine, with a standard deviation of 0.13.

Benchmark Notes

Both QEMU and Cartesi Machine used the same guest kernel and guest rootfs.
QEMU is faster because it has a JIT (just in time compilation)
Floating-point benchmarks are slow because of software floating point emulation
Vector math benchmark is slow because the guest CPU has no support for SIMD instructions while the host has
Square root benchmark is the worst because it's the heaviest instruction in the Cartesi Machine

Conclusions

Cartesi Machine can be 5.3x - 179.88x slower than the host, with a median of 18x, depending on the workload.
Cartesi Machine can be between 1.13x - 9.70x slower than QEMU, with a median of 2x, that is pretty good considering there is no JIT.
CPU registers benchmark is the fastest, meaning read and writes of RISC-V general purpose registers is fast.
Square root of floating-point numbers is the slowest benchmark, because it's the only instruction in the RISC-V interpreter that performs a loop.
SHA-256 and integer vector arithmetic are noticeable slower because there is no support for SIMD instructions.
Floating-point benchmarks are noticeable slower because of the deterministic software emulation in the RISC-V interpreter.

Benchmark Environment

QEMU 9.1.2
Host CPU x86_64 Intel Core i9-14900K
Host Linux 6.6.65-1-lts
Guest Linux 6.5.13-ctsi-1
GCC 14.2.1 20240910
stress-ng 0.17.06
Cartesi Machine Emulator 0.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmark Notes

Conclusions

Benchmark Environment

Clone this wiki locally