Skip to content

Add insertion methods with fallible allocation to cranelift_bforest::{Map,Set}#12685

Open
fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
fitzgen:bforest-fallible-insertion-methods
Open

Add insertion methods with fallible allocation to cranelift_bforest::{Map,Set}#12685
fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
fitzgen:bforest-fallible-insertion-methods

Conversation

@fitzgen
Copy link
Member

@fitzgen fitzgen commented Feb 27, 2026

No description provided.

@fitzgen fitzgen requested review from a team as code owners February 27, 2026 00:00
@fitzgen fitzgen requested review from alexcrichton and removed request for a team February 27, 2026 00:00
@alexcrichton
Copy link
Member

Before going too too far down this route (this PR is fine in isolation though), could you detail a bit more the end state here? I'm assuming this is eventually going to make its way into the ModuleRegistry within a Store in wasmtime to replace the BTreeMaps there currently. I'm not familiar with the data structures in this crate, but currently K and V are both bounded by Copy which doesn't satisfy the current usage within ModuleRegistry.

So overall I'm not quite sure I see the vision of where this is going to end up, hence the question -- could you detail a bit more about how this is going to make its way into Wasmtime? We have a surprisingly large number of BTreeMaps used in the runtime, and is this aimed at all of them or just a subset?

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure labels Feb 27, 2026
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "cranelift", "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen
Copy link
Member Author

fitzgen commented Feb 27, 2026

Before going too too far down this route (this PR is fine in isolation though), could you detail a bit more the end state here? I'm assuming this is eventually going to make its way into the ModuleRegistry within a Store in wasmtime to replace the BTreeMaps there currently. I'm not familiar with the data structures in this crate, but currently K and V are both bounded by Copy which doesn't satisfy the current usage within ModuleRegistry.

So overall I'm not quite sure I see the vision of where this is going to end up, hence the question -- could you detail a bit more about how this is going to make its way into Wasmtime? We have a surprisingly large number of BTreeMaps used in the runtime, and is this aimed at all of them or just a subset?

Yeah, we do use BTreeMaps in a bunch of core places in the core runtime, and we use them in ways that make it hard to replace with other data structures.

My plan is to create a wasmtime_environ::collections::BTreeMap that looks something like this:

#[derive(...)]
struct BTreeMapValueIndex(u32);
entity_impl!(BTreeMapValueIndex);


pub struct BTreeMap<K, V>
where
    K: Copy,
{
    values: PrimaryMap<BTreeMapValueIndex, V>,
    forest: cranelift_bforest::MapForest<K, BTreeMapValueIndex>,
    map: cranelift_bforest::Map<K, BTreeMapValueIndex>,
}

impl<K, V> BTreeMap<K, V>
where
    K: Copy,
{
    pub fn get(&self, key: K) -> Option<&V>
    where
        K: Ord,
    {
        let idx = self.map.get(key, &self.forest, &()?;
        Some(&self.values[idx])
    }

    // etc...
}

I haven't done a full audit, but everything I've encountered so far has Copy keys (which, as you note, cranelift_bforest::Map requires), so we wouldn't need another indirection for keys in the map, like we have with values here. But if we have to, we can add that as well. It would unfortunately have the extra wrinkle of requiring some kind of hash consing or something tho, but that is not insurmountable.

If you are concerned about the performance implications this extra indirection has, we can put it behind a cargo feature. When the feature is not enabled, we would wrap std::collections::BTreeMap instead of doing the above, exposing the same OOM-handling methods/interface/API to users of the collection, but not actually attempting to handle OOM internally in this scenario.

@alexcrichton
Copy link
Member

Ah ok that makes sense yeah. And to confirm, I'm not familiar with the implementation of bforest, but given the documented caveats you're confident that this is a suitable replacement for our usages of BTreeMap? I'm not sure if that BTreeMap would, for example, end up being many times larger in the by-value size than std::collections::BTreeMap with the various collections it contains.

Also, that design may be a bit tricky insofar as it won't easily support removal without further maps/etc for tracking that. I don't think we need to remove from Store-based maps in ModuleRegistry, but we will need a solution for removing from the global map that has all loaded modules.

Perf-wise I'm not familiar enough with bforest to have much of an opinion on that. I don't think we need a hyper-optimal data structure here and just something that isn't O(n) in the worst case. I do think this is a reasonable enough time to rethink our data structures, though, perhaps?

For example BTreeMap feels rightly necessary for the global map (or some sort of sorted map thingy), but for ModuleRegistry in theory BTreeMap is way overkill for our use case. Almost all stores have just a single module or component in them and it's quite rare to have a very large number, so we might actually be able to get away with just a plain old sorted Vec<T>. I realize that I just also said we shouldn't do O(n) things which runs counter to that, but we could in theory also dynamically switch between a Vec<T> and a map after crossing some threshold (e.g. 100+ modules).

It's been a bit though since I looked at ModuleRegistry too closely, but WDYT about maybe first refactoring to remove some BTreeMaps to reduce the number of constraints on our future TryBTreeMap?

@fitzgen
Copy link
Member Author

fitzgen commented Feb 27, 2026

Ah ok that makes sense yeah. And to confirm, I'm not familiar with the implementation of bforest, but given the documented caveats you're confident that this is a suitable replacement for our usages of BTreeMap? I'm not sure if that BTreeMap would, for example, end up being many times larger in the by-value size than std::collections::BTreeMap with the various collections it contains.

Those documented caveats seem workable, AFAICT.

I haven't investigated exact size differences, but I think it should be within a constant factor of the same size.

Also, that design may be a bit tricky insofar as it won't easily support removal without further maps/etc for tracking that. I don't think we need to remove from Store-based maps in ModuleRegistry, but we will need a solution for removing from the global map that has all loaded modules.

I guess my initial sketch had values: PrimaryMap<BTreeMapValueIndex, V>, but we probably actually want values: Slab<V>, so we have a free list for reusing entries and don't require a value for every entry in the arena. Then remove is simply something like this:

pub fn remove(&mut self, key: K) -> Option<V> {
    let id = self.map.remove(key, &mut self.forest, &())?;
    Some(self.values.dealloc(id))
}

but for ModuleRegistry in theory BTreeMap is way overkill for our use case. Almost all stores have just a single module or component in them and it's quite rare to have a very large number, so we might actually be able to get away with just a plain old sorted Vec<T>. I realize that I just also said we shouldn't do O(n) things which runs counter to that, but we could in theory also dynamically switch between a Vec<T> and a map after crossing some threshold (e.g. 100+ modules).

It's been a bit though since I looked at ModuleRegistry too closely, but WDYT about maybe first refactoring to remove some BTreeMaps to reduce the number of constraints on our future TryBTreeMap?

Generally a fan of re-thinking data structures where it makes sense as we go through and touch various things.

However, in the particular case of ModuleRegistry, and doing something like

enum ModuleRegistryData {
    Few(Vec<Module>),
    Many(BTreeMap<RegisteredModuleId, Module>),
}

I'm fairly meh. We need a BTreeMap implementation either way, so it isn't saving implementation work. Also, a single BTree node is roughly equivalent to a Vec of a small size if you squint, so I am not sure it would make a meaningful perf difference.

What might be worth it would be something like this tho:

enum ModuleRegistryData {
    One(Module),
    Many(BTreeMap<RegisteredModuleId, Module>),
}

This would let us avoid any allocation at all in the extremely common case of a single module in the registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants