Skip to content

Benchmarks

Eduardo Bart edited this page Dec 19, 2024 · 22 revisions
Benchmark Host QEMU Cartesi Machine Benchmark command
CPU registers 1 1.18 ± 0.20 5.30 ± 0.13 stress-ng --regs 1 --regs-ops 10000
Zlib compression 1 4.16 ± 0.10 7.83 ± 0.10 stress-ng --zlib 1 --zlib-ops 20
Forking 1 8.28 ± 0.22 9.36 ± 0.23 stress-ng --fork 1 --fork-ops 1000
Naive loop 1 4.15 ± 0.15 11.26 ± 0.39 stress-ng --cpu 1 --cpu-method loop --cpu-ops 400
Memory read/write 1 5.36 ± 0.38 12.35 ± 0.86 stress-ng --memrate 1 --memrate-bytes 2M --memrate-ops 200
Heapsort 1 7.14 ± 0.11 12.62 ± 0.31 stress-ng --heapsort 1 --heapsort-ops 3
Fibonacci 1 4.97 ± 0.07 15.32 ± 0.17 stress-ng --cpu 1 --cpu-method fibonacci --cpu-ops 400
Linux syscalls 1 1.59 ± 0.15 15.43 ± 1.41 stress-ng --syscall 1 --syscall-ops 4000
Checksum hashes 1 7.21 ± 0.17 15.81 ± 0.40 stress-ng --hash 1 --hash-ops 40000
Cache thrashing 1 10.36 ± 2.20 15.83 ± 1.06 stress-ng --cache 1 --cache-ops 100000
Disk writes 1 8.23 ± 0.44 17.36 ± 0.95 stress-ng --hdd 1 --hdd-ops 2000
Quicksort 1 10.74 ± 0.22 17.78 ± 0.36 stress-ng --qsort 1 --qsort-ops 5
TLB shootdowns 1 11.16 ± 0.90 18.12 ± 1.08 stress-ng --tlb-shootdown 1 --tlb-shootdown-ops 2000
Memory allocation 1 15.65 ± 0.89 19.65 ± 1.12 stress-ng --malloc 1 --malloc-ops 40000
Integer arithmetic 1 13.85 ± 0.47 25.60 ± 0.86 stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400
Memory copy 1 11.57 ± 0.40 32.60 ± 5.21 stress-ng --memcpy 1 --memcpy-ops 50
Instruction cache thrashing 1 26.68 ± 0.72 36.64 ± 1.00 stress-ng --icache 1 --icache-ops 200
SHA-256 1 19.23 ± 1.21 37.95 ± 2.22 stress-ng --crypt 1 --crypt-method SHA-256 --crypt-ops 400000
Floating-point math 1 24.18 ± 0.87 41.85 ± 1.59 stress-ng --fp 1 --fp-method floatadd --fp-ops 1000
Floating-point vector math 1 25.47 ± 2.11 53.08 ± 4.37 stress-ng --vecfp 1 --vecfp-ops 100
Floating-point matrix multiplication 1 33.66 ± 3.21 62.17 ± 5.92 stress-ng --matrix 1 --matrix-method mult --matrix-ops 20000
Floating-point fused multiply add 1 30.16 ± 2.16 63.73 ± 4.54 stress-ng --fma 1 --fma-ops 40000
Floating-point trigonometric math 1 38.37 ± 4.28 81.17 ± 9.05 stress-ng --trig 1 --trig-ops 50
Integer vector arithmetic 1 18.95 ± 1.51 112.91 ± 8.81 stress-ng --vecmath 1 --vecmath-ops 100
Floating-point square root 1 62.32 ± 9.64 179.88 ± 27.6 stress-ng --cpu 1 --cpu-method sqrt --cpu-ops 20

How to read: All numbers are relative speed to the same benchmark run on the host, for example 5.30 + 0.13 means the benchmark on the host was 5.30 times faster than in the guest virtual machine, with a standard deviation of 0.13.

Benchmark Notes

  • Both QEMU and Cartesi Machine used the same guest kernel and guest rootfs.
  • QEMU is faster because it has a JIT (just in time compilation)
  • Floating-point benchmarks are slow because of software floating point emulation
  • Vector math benchmark is slow because the guest CPU has no support for SIMD instructions while the host has
  • Square root benchmark is the worst because it's the heaviest instruction in the Cartesi Machine

Conclusions

  • Cartesi Machine can be 5.3x - 179.88x slower than the host, with a median of 18x, depending on the workload.
  • Cartesi Machine can be between 1.13x - 9.70x slower than QEMU, with a median of 2x, that is pretty good considering there is no JIT.
  • CPU registers benchmark is the fastest, meaning read and writes of RISC-V general purpose registers is fast.
  • Square root of floating-point numbers is the slowest benchmark, because it's the only instruction in the RISC-V interpreter that performs a loop.
  • SHA-256 and integer vector arithmetic are noticeable slower because there is no support for SIMD instructions.
  • Floating-point benchmarks are noticeable slower because of the deterministic software emulation in the RISC-V interpreter.

Benchmark Environment

  • QEMU 9.1.2
  • Host CPU x86_64 Intel Core i9-14900K
  • Host Linux 6.6.65-1-lts
  • Guest Linux 6.5.13-ctsi-1
  • GCC 14.2.1 20240910
  • stress-ng 0.17.06
  • Cartesi Machine Emulator 0.19.0
Clone this wiki locally