Skip to content

Support non-str values in a Rodeo (yet another attempt)#52

Open
anchpop wants to merge 1 commit intoKixiron:masterfrom
anchpop:generic-internable
Open

Support non-str values in a Rodeo (yet another attempt)#52
anchpop wants to merge 1 commit intoKixiron:masterfrom
anchpop:generic-internable

Conversation

@anchpop
Copy link

@anchpop anchpop commented Jan 25, 2026

This is attempting the same thing as #32. The goal is the same but the implementation is different. For example, #32 is actually unsound, while this PR is not (to my knowledge). The critical difference is that Spur must be typed with the type being stored. This prevents you from using a Spur from a Rodeo storing Strings with a Rodeo storing something else.

Another difference is that the Internable trait is designed to be implemented for the owned version of the type, e.g. String or Vec<u8>. This allows the user to take the owned value, Vec<u8>, and spur-ify it by doing Spur<Vec<u8>>. This is imo more intuitive. A downside is that the type of the Rodeo cannot be inferred from the types being inserted into it.

The issue brought up by @Kixiron, "We can't assume that all internable types play nicely as byte slices", is not really addressed by this PR. My goal was not to intern OsStr, but to intern Vecs of types that implement Copy.

I put the type being stored as the first type argument to Rodeo. This is a breaking change obviously. Anyone using a non-default Spur type will need to update their code. We could probably use diagnostic::on_unimplemented to make it clear to users what they need to do when upgrading.

I also added an example of the new functionality. IMO, it is very intuitive:

use lasso::Rodeo;

fn main() {
    // Create a rodeo that interns Vec<i32> instead of String
    let mut rodeo: Rodeo<Vec<i32>> = Rodeo::new();

    // Intern some integer sequences
    let a = rodeo.get_or_intern(vec![1, 2, 3]);
    let b = rodeo.get_or_intern(vec![4, 5, 6, 7, 8]);

    // Interning the same value returns the same key
    let a2 = rodeo.get_or_intern(vec![1, 2, 3]);
    assert_eq!(a, a2);

    // Resolve keys back to values
    assert_eq!(rodeo.resolve(&a), &[1, 2, 3]);
    assert_eq!(rodeo.resolve(&b), &[4, 5, 6, 7, 8]);

    // Lookup by value
    assert_eq!(rodeo.get([1, 2, 3].as_slice()), Some(a));
    assert_eq!(rodeo.get([7, 8, 9].as_slice()), None);

    println!("Interned {} sequences", rodeo.len());
}

Disclaimer: Some LLM assistance was used in the creation of this PR. I updated Rodeo and Arena, then started updating RodeoReader and RodeoResolver, then used LLMs to automate the mechanical work of updating all the types in RodeoReader RodeoResolver. The design is mine and I've reviewed all the code manually of course. I also left many references to strings in the code (e.g. variable names and comments) as IMO focusing on the concrete case makes it easier to understand.

@anchpop anchpop force-pushed the generic-internable branch 3 times, most recently from 44a078a to eeff631 Compare January 25, 2026 23:17
@anchpop anchpop force-pushed the generic-internable branch from eeff631 to c001df0 Compare January 26, 2026 00:27
@anchpop anchpop marked this pull request as ready for review February 9, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant