Skip to content
This repository has been archived by the owner on Sep 21, 2024. It is now read-only.

feat: Initial MVP implementation of RocksDbStorage #623

Merged
merged 1 commit into from
Sep 13, 2023
Merged

feat: Initial MVP implementation of RocksDbStorage #623

merged 1 commit into from
Sep 13, 2023

Conversation

jsantell
Copy link
Contributor

@jsantell jsantell commented Sep 6, 2023

First pass on a RocksDB backend.

Built on #622 with a few more changes to make storage agnostic in the remaining parts (e.g. helpers and noosphere::platform). See the patch on top of #622.

Feature Flags

There's an additive flag for using RocksDB in multithreaded mode (rational: target features using base feature, cleaner especially given a default assumption of "sled" in a world where we could also have more backends).

Flags for noosphere and noosphere_cli just propagate to noosphere_storage. A different name (experimental-rocksdb vs rocksdb) used at the top level to denote experimentality, and ensure Cargo feature flags propagate correctly when using a workspace rather than applying everywhere applicable. With more RocksDB experience, we can most likely just use one (single or multi threaded).

  • noosphere:experimental-rocksdb
  • noosphere:experimental-rocksdb-mt
  • noosphere_cli:experimental-rocksdb
  • noosphere_cli:experimental-rocksdb-mt
  • noosphere_storage:rocksdb
  • noosphere_storage:rocksdb-mt

There are other ways of managing these backends, no strong preference.

Performance

Definitely room for improvement here, but currently the RocksDB implementation is half as slow as sled. No noticeable difference between single and multithreaded RocksDB implementations.

Running full test suite (with inconsistent doc tests)

  • 55s-70s RocksDB
  • 28s-40s Sled

Running noosphere-cli tests only

  • 5.9s RocksDB
  • 2.5s Sled

@jsantell
Copy link
Contributor Author

jsantell commented Sep 8, 2023

Added PerformanceStorage<S: Storage> for tracking perf/space (could be rolled into TrackingStorage), a Space trait (with Space::get_space_usage) implemented for storage providers.

The benchmarks are in noosphere_storage/examples/bench. This work is gated behind performance flag, though even with features defined in the example manifest, it still needs passed in the command line. Updated CI to add these flags, but there's gotta be a better solution.

@jsantell
Copy link
Contributor Author

jsantell commented Sep 8, 2023

Initial takeaways of running some high-level comparisons in these benches:

edit: updated with crude impl of sqlite

DB Sm Read Lg Read Sm Write Lg Write Space Amp
Sled 10us 30us 30us 140us 2
Rocks 4us 10us 30us 100us 1.1
Sqlite 86us 84us 19000us 17000us 1.1
IDB (Firefox) 700us 1200us 900us 1300us 1.2
IDB (Chrome) 1200us 1200us 7700us 9700us 1.2
  • Not calling RocksDB flush results in similar read/write performance between Rocks and Sled (with Rocks slightly faster). The WAL should mean we do not need to manually call flush.
  • Firefox write performance is faster than Chrome by an order of magnitude (1-1.5ms vs 8-10ms)
  • RocksDB and IndexedDb implementations have pretty good space amplification (~1.25x). Sled is closer to 2x.
  • IndexedDb and Sqlite have worse space amplification for small databases, not particularly relevant IMO
  • Suspicious that small reads/writes take longer than large read/writes for Sqlite

@jsantell
Copy link
Contributor Author

jsantell commented Sep 8, 2023

Adding --experimental-rocksdb to apple builds to test the generated artifact

@jsantell
Copy link
Contributor Author

Merged in SqliteStorage work from #549 (stats updated in #623 (comment)), fixed up glue to hopefully generate a usable iOS simulator artifact

Copy link
Collaborator

@cdata cdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work on this.

rust/noosphere-cli/Cargo.toml Outdated Show resolved Hide resolved
.github/workflows/noosphere_apple_build.yaml Outdated Show resolved Hide resolved
rust/noosphere-cli/tests/helpers/random.rs Outdated Show resolved Hide resolved
rust/noosphere-core/src/bench.rs Outdated Show resolved Hide resolved
rust/noosphere-core/src/bench.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/src/implementation/rocks_db.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/src/implementation/rocks_db.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/src/implementation/rocks_db.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/src/implementation/sqlite.rs Outdated Show resolved Hide resolved
@jsantell
Copy link
Contributor Author

Updated from feedback (not yet accounting for rebase shuffling of some helpers)

@cdata
Copy link
Collaborator

cdata commented Sep 12, 2023

LMK when you are at a good point and I'll give the rebase a go now that #624 is landed.

@jsantell jsantell force-pushed the rocksdb branch 3 times, most recently from 27b26df to 7edf0b0 Compare September 12, 2023 20:57
@@ -72,7 +72,7 @@ jobs:
with:
brew: protobuf cmake
- name: 'Cargo Build'
run: cargo build --package noosphere --release --locked --target ${{ matrix.target }}
run: cargo build --package noosphere --release --locked --target ${{ matrix.target }} --features rocksdb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing that should be backed out before landing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to back this out? We want this to be the default for iOS, right?

Thinking about this a little more: we probably want to make this part of the matrix that gets generated in this step. We should use rocksdb for iOS and the default (presumably sled) everywhere else so that we don't have to migrate orb-created desktop replicas just yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just decoupling the implementation from the "on" switch -- we could make it a part of the build matrix, or configure it in noosphere::platform

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if it's possible to distinguish e.g., M2/iPad from M2/MacBook Air in this way. We may need to talk it over a little bit to figure out the safe path forward (or face coming up with a migration playbook for orb users).

/// Represents a request to be processed in `MessageProcessor`,
/// sent from the associated `MessageClient`.
/// Represents a request to be processed in [MessageProcessor],
/// sent from the associated [MessageClient].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A doc example might be a nice stand-in for tests, just to demonstrate intended usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind, I see there is a test!

@@ -94,6 +94,9 @@ impl Drop for NsNoosphere {
/// You can also initialize the ns_noosphere_t with an optional third
/// argument: a URL string that refers to a Noosphere Gateway API somewhere on
/// the network that one or more local spheres may have access to.
///
/// Note that storages (`global_storage_path`, `sphere_storage_path`) can only
/// be opened by a single `ns_noosphere` at a time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

impl RocksDbStorage {
pub fn new(path: PathBuf) -> Result<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the storage is initialized multiple times over the same location? Idempotent/safe to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same restrictions as sled e.g. only one new() per path allowed at a time, e.g. this fails:

    async fn it_can_open_concurrent_dbs() -> Result<()> {
        let tempdir = tempfile::TempDir::new()?;
        let storage_path = tempdir.path().to_owned();

        let storage_1 = RocksDbStorage::new(storage_path.clone())?;
        let mut store_1 = storage_1.get_key_value_store("links").await?;
        let storage_2 = RocksDbStorage::new(storage_path.clone())?;
        let mut store_2 = storage_2.get_key_value_store("links").await?;

        let bytes_1 = vec![0, 1, 2, 3];
        let bytes_2 = vec![4, 5, 6, 7];
        store_1.write(b"from 1", &bytes_1).await?;
        store_2.write(b"from 2", &bytes_2).await?;
        assert_eq!(store_1.read(b"from 2").await?.unwrap(), bytes_2);
        assert_eq!(store_2.read(b"from 1").await?.unwrap(), bytes_1);
        Ok(())
    }

* Introduce `RocksDbStorage`
* Rename NativeStorage -> SledStorage, WebStorage -> IndexedDbStorage. Make storage generic over gateway.
* Added `PerformanceStorage<S: Storage>`, `trait noosphere_storage::Space` for benchmarking storage
  backends.
Copy link
Collaborator

@cdata cdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, excellent change!

@cdata cdata merged commit 7155f86 into main Sep 13, 2023
@cdata cdata deleted the rocksdb branch September 13, 2023 18:15
gordonbrander pushed a commit to subconsciousnetwork/subconscious that referenced this pull request Sep 13, 2023
This test started failing while testing
subconsciousnetwork/noosphere#623, leading us to
question why it ever passed in the first place. Due to Swift compiler
trivia the deinit for `Noosphere` is not guaranteed to be called at the
time we overwrite the variable, meaning two instances of `Noosphere`
co-exist pointing at the same data. We should not ever be trying to open
multiple instances of `Noosphere` pointing at the same database.

After team discussion we decided to drop this test case and make this an
explicit failure case on the Noosphere side.
@github-actions github-actions bot mentioned this pull request Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants