You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 21, 2024. It is now read-only.
Running into a stack overflow/segfault when running RocksDB in multiplayer::orb_can_render_peers_in_the_sphere_address_book, discovered in #655.
To summarize, introducing RocksDB causes a segfault in one of our stress tests, consistently reproducible locally and in CI, depending on things that we wouldn't expect to cause this, like the shape of some structs. In #655, we store a Storage instance in SphereDb. This change alone causes the segfault:
While Arc is more appropriate here anyway, it shouldn't have an effect on this segfault. Using an even more appropriate Cow instead still fails. That is to say, there is some spooky issue regardless of #655 and using RocksDB.
Things we've tried:
Are we overflowing the stack?
No, running with unlimited stack size (ulimit -s unlimited) had no effect. Only using 2MB of 8MB stack space according to measuring address pointers in gdb.
Does the multi-threaded impl of RocksDB work?
No, using our single or multi threaded implementations have seemingly no effect.
Using latest rocksdb from source, building from source
No effect.
Any hints from valgrind?
No invalid reads or writes, though some suspicious "definitely lost" blocks only in the fail scenario, which could be attributed to no cleanups occurring
We allocate 750MB of memory in this single test flow, could be relevant
Any RocksDB flags we can set that would avoid this, or give us more info?
Looking through logs, setting paranoid checks, no new useful information
What about nightly?
buzzer sounds
Any insight from running sanitizers?
RUSTFLAGS="-Z sanitizer=memory" cargo +nightly test --target x86_64-unknown-linux-gnu --features rocksdb,test-kubo orb_can_render
Leak sanitizer: segfaults much earlier in the test flow: Thread 4 "tokio-runtime-w" received signal SIGSEGV, Segmentation fault. when setting a key and serializing a multihash (libp2p/ns)
Running into a stack overflow/segfault when running RocksDB in
multiplayer::orb_can_render_peers_in_the_sphere_address_book
, discovered in #655.To summarize, introducing RocksDB causes a segfault in one of our stress tests, consistently reproducible locally and in CI, depending on things that we wouldn't expect to cause this, like the shape of some structs. In #655, we store a
Storage
instance inSphereDb
. This change alone causes the segfault:In #655, changing
RocksDbStore
'sname
property to be anArc
"fixes" the segfault:While
Arc
is more appropriate here anyway, it shouldn't have an effect on this segfault. Using an even more appropriateCow
instead still fails. That is to say, there is some spooky issue regardless of #655 and using RocksDB.Things we've tried:
ulimit -s unlimited
) had no effect. Only using 2MB of 8MB stack space according to measuring address pointers ingdb
.RUSTFLAGS="-Z sanitizer=memory" cargo +nightly test --target x86_64-unknown-linux-gnu --features rocksdb,test-kubo orb_can_render
WARN flush_to_writer{writer=SphereWriter { kind: Root { force_full_render: false }, paths: SpherePaths { root: "/tmp/.tmpRjestr" }, base: Once(Uninit), mount: Once(Uninit), private: Once(Uninit) }}: Content write failed: No such file or directory (os error 2)
rustc --test
) rust-lang/rust#39608rustc --test
) rust-lang/rust#39610Thread 4 "tokio-runtime-w" received signal SIGSEGV, Segmentation fault.
when setting a key and serializing a multihash (libp2p/ns)Stack trace of the offending stack:
The text was updated successfully, but these errors were encountered: