Skip to content
This repository has been archived by the owner on Sep 21, 2024. It is now read-only.

feat: Introduce EphemeralStorage #655

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

feat: Introduce EphemeralStorage #655

wants to merge 1 commit into from

Conversation

jsantell
Copy link
Contributor

@jsantell jsantell commented Sep 27, 2023

  • Introduce EphemeralStorage trait, allowing storages to shard off usage for an EphemeralStore.
  • Sled/Rocks/IndexedDb implementations of EphemeralStorage with tests. Sled uses ephemeral trees, and rocks/indexeddb shard a shared ephemeral store. Ephemeral stores are cleared from the provider on Drop, and again on startup for shared ephemeral stores in the event cleanups are missed (e.g. force quit)
  • A few friends from feat: Introduce Importable/Exportable Storage traits. #633:
    • OpenStorage for unified paths over other trait operators
    • NonPersistentStorage for tests
    • IterableStore for testing internal details of store ensuring ephemeral storage is cleared (now also implemented for Rocks, with tests)
  • IndexedDb migration tests (for the new ephemeral store).

/// Spawn a future by scheduling it with the local executor. The
/// future will immediately start processing.
pub fn spawn_immediate<F>(future: F)
where
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may not be necessary, just use spawn with a wasm32 version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the async fn spawn() still alerts to use .await hm

rust/noosphere-storage/src/tap.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/Cargo.toml Outdated Show resolved Hide resolved
rust/noosphere-storage/Cargo.toml Outdated Show resolved Hide resolved
#[cfg_attr(not(target_arch = "wasm32"), async_trait)]
#[cfg_attr(target_arch = "wasm32", async_trait(?Send))]
pub trait EphemeralStorage: ConditionalSync {
type EphemeralStoreType: KeyValueStore + Disposable;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need better name for EphemeralStoreType, as the return value is EphemeralStore<Self::EphemeralStoreType> -- EphemeralStoreInner?

rust/noosphere-common/src/helpers/wait.rs Outdated Show resolved Hide resolved
/// Spawn a future by scheduling it with the local executor. The
/// future will immediately start processing.
pub fn spawn_immediate<F>(future: F)
where
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the async fn spawn() still alerts to use .await hm

@jsantell jsantell requested a review from cdata September 27, 2023 18:45
@jsantell jsantell marked this pull request as ready for review September 27, 2023 18:45
@jsantell jsantell changed the title WIP: Introduce EphemeralStorage feat: Introduce EphemeralStorage Sep 27, 2023
key.starts_with(self.partition_key.as_slice())
}

pub fn get_key_range(&self) -> (&Vec<u8>, &Vec<u8>) {
Copy link
Contributor Author

@jsantell jsantell Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol for an "end" of the prefix range could be improved possibly with the partition_key with the last character incremented. Both Rocks and IDB use a partitioned implementation as well as have the same interface for key ranges. Rocks has a more robust solution for generating a key range from a prefix which we could adopt: https://docs.rs/rocksdb/0.21.0/rocksdb/struct.PrefixRange.html

@jsantell
Copy link
Contributor Author

  • This stack overflow in multiplayer::orb_can_render_peers_in_the_sphere_address_book may not be an intermittent in the rocksdb build
  • #[ignore] doesn't not run the IndexedDb migration test, timing out

@jsantell jsantell force-pushed the scratch branch 2 times, most recently from a7fe73c to e65892e Compare September 29, 2023 19:42
@jsantell
Copy link
Contributor Author

jsantell commented Sep 29, 2023

  • This stack overflow in multiplayer::orb_can_render_peers_in_the_sphere_address_book may not be an intermittent in the rocksdb build

    • #[ignore] doesn't not run the IndexedDb migration test, timing out

The RocksDB build would consistently stack overflow on orb sphere render in multiplayer::orb_can_render_peers_in_the_sphere_address_book, introduced by SphereDb<S> owning a Storage:

--- a/rust/noosphere-storage/src/db.rs
+++ b/rust/noosphere-storage/src/db.rs
@@ -40,6 +40,7 @@ where
     link_store: S::KeyValueStore,
     version_store: S::KeyValueStore,
     metadata_store: S::KeyValueStore,
+    storage: S,
 }
 
 impl<S> SphereDb<S>
@@ -52,6 +53,7 @@ where
             link_store: storage.get_key_value_store(LINK_STORE).await?,
             version_store: storage.get_key_value_store(VERSION_STORE).await?,
             metadata_store: storage.get_key_value_store(METADATA_STORE).await?,
+            storage: storage.to_owned(),
         })
     }

Changing RocksDbStore to hold an Arc<String> instead of String no longer causes a stack overflow:

--- a/rust/noosphere-storage/src/implementation/rocks_db.rs
+++ b/rust/noosphere-storage/src/implementation/rocks_db.rs
@@ -85,13 +85,13 @@ impl Storage for RocksDbStorage {
 
 #[derive(Clone)]
 pub struct RocksDbStore {
-    name: String,
+    name: Arc<String>,
     db: Arc<DbInner>,
 }
 
 impl RocksDbStore {
     pub fn new(db: Arc<DbInner>, name: String) -> Result<Self> {
-        Ok(RocksDbStore { db, name })
+        Ok(RocksDbStore { db, name: Arc::new(name) })
     }

RocksDB's !Sync cf_handle would be preferable here. Adding any other field to RocksDbStore causes the overflow again. While we make heavy use of cloning stores in the workflows, we "only" have less than 100 stores at once. Stack size at overflow is a quarter of the system ulimit -s. Appears as a SIGSEGV in gdb.

@jsantell
Copy link
Contributor Author

@jsantell jsantell force-pushed the scratch branch 2 times, most recently from 5d1ef80 to 4630288 Compare October 19, 2023 23:28
@jsantell
Copy link
Contributor Author

This is consistently causing a stack overflow on Windows:

2023-10-19T23:42:07.2354734Z thread 'multiplayer::orb_can_render_peers_in_the_sphere_address_book' has overflowed its stack
2023-10-19T23:42:09.8418064Z error: test failed, to rerun pass -p noosphere --test distributed_stress
2023-10-19T23:42:09.8419281Z
2023-10-19T23:42:09.8419673Z Caused by:
2023-10-19T23:42:09.8421990Z process didn't exit successfully: D:\a\noosphere\noosphere\target\debug\deps\distributed_stress-6752626ddf09eb78.exe (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)
2023-10-19T23:42:09.8424887Z note: test exited abnormally; to see the full output pass --nocapture to the harness.
2023-10-19T23:42:11.2052981Z ##[error]Process completed with exit code 1.

@jsantell jsantell force-pushed the scratch branch 3 times, most recently from 57d2290 to 869abef Compare October 20, 2023 19:25
# threads themselves.
#
# While our main thread isn't under fire here, notating this for future use:
# https://users.rust-lang.org/t/stack-overflow-when-compiling-on-windows-10/50818/8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would be applied when running locally outside of CI.

The immediate solutions to achieve this would be to increase the stack size of only the offending test, but the test harness thread itself is overflowing it seems. Alternatively, a Cargo.toml test-only flag such that would apply in both CI and local runs, but no leads yet on how to configure something like that.

@jsantell
Copy link
Contributor Author

This is consistently causing a stack overflow on Windows:

2023-10-19T23:42:07.2354734Z thread 'multiplayer::orb_can_render_peers_in_the_sphere_address_book' has overflowed its stack
2023-10-19T23:42:09.8418064Z error: test failed, to rerun pass -p noosphere --test distributed_stress
2023-10-19T23:42:09.8419281Z
2023-10-19T23:42:09.8419673Z Caused by:
2023-10-19T23:42:09.8421990Z process didn't exit successfully: D:\a\noosphere\noosphere\target\debug\deps\distributed_stress-6752626ddf09eb78.exe (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)
2023-10-19T23:42:09.8424887Z note: test exited abnormally; to see the full output pass --nocapture to the harness.
2023-10-19T23:42:11.2052981Z ##[error]Process completed with exit code 1.

Fixed by increasing thread stack size on Windows.

Separately, rerunning the failed builds resulted in a successful rocksdb test run

@jsantell
Copy link
Contributor Author

rebased on #685

@jsantell jsantell force-pushed the scratch branch 2 times, most recently from 0dc1563 to 221bdaa Compare October 31, 2023 20:10
Copy link
Contributor

github-actions bot commented Oct 31, 2023

Test flake analysis

status platform features toolchain
🟢 macos-13 test-kubo,headers,rocksdb stable
🟢 ubuntu-latest test-kubo,headers nightly
🟢 windows-latest test-kubo,headers stable
🟡 macos-13 test-kubo,headers stable
🟢 ubuntu-latest test-kubo,headers stable
🟢 ubuntu-latest test-kubo,headers,rocksdb stable

Flake summary for macos-13, test-kubo,headers, stable

     Summary [  85.082s] 180 tests run: 180 passed (1 slow, 1 flaky), 1 skipped
   FLAKY 2/6 [   1.785s] noosphere-ns dht::node::test::test_dhtnode_bootstrap

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant