Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for value section parsing #1541

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

primoly
Copy link
Contributor

@primoly primoly commented May 9, 2024

This draft adds support for component value sections to wasmparser.
WebAssembly/component-model#336

Questions, issues and remarks:

The validator currently does not support validation of the actual values:

The parsing of values depends on the component type space, so a value section can’t be parsed in isolation WebAssembly/component-model#352 How should this be handled? Currently, ComponentValue only parses the type and keeps the bytes of the val. To read the actual value the val method takes a slice of ComponentTypes which must be in the order they were read from the current components ComponentTypeSectionReader.

wasmparser duplicates the types in validator and reader. Currently values.rs uses the ones from reader. Should it use types::Types from validator instead?

Validation only needs to call val, as the parser then checks if the types of values are correct.

Crates depending on wasmparser (such as wasmprinter) need to keep track of the component type space. wasmprinter already keeps some global module/component information in the state variable. wasmparser itself does that in its validator.

Parsing of flags that set labels past 64 to true currently fails. Needs support for arbitrarily large unsigned LEB128.

How should flags be stored in wasmparser? Explainer.md: list of labels of fields set to true. For wasmparser I think this will likely confuse people into thinking the u32s are just bools encoded as integers. So I instead chose Vec<bool>, which would treat it as a special record with only bool fields (it already is). More symmetrical and ergonomic.

Should Record, Flags, Variant and Enum store the label names? They are already provided in the ComponentType. Flags could then also be a Vec<&str>.

BinaryReaderErrors use the offset reader.original_position(). Might not always be correct (off by one or couple bytes).

@alexcrichton
Copy link
Member

Thanks for the PR and the work here!

Should it use types::Types from validator instead?

I posted a comment on the spec PR about this, but otherwise for now you'll probably want to take a TypesRef<'_> here to represent that a validator's context is required.

Needs support for arbitrarily large unsigned LEB128.

This might be good to change at the spec-level perhaps too and to spec that beyond 32-bits the value is represented as N 32-bit values so we would decode 32-bit chunks here.

Should Record, Flags, Variant and Enum store the label names?

I'm a bit wary of the current representation of Val for other reasons. It's the "easiest" representation in a sense but it's also one of the less efficient representations. It might make sense to perhaps instead explore a "visitor" like interface where there's a trait ValVisitor where methods are invoked for each parsed piece. For example it'd have something like record_start with record_field and record_end. That way storage of the names would be up to the caller and while Val could be created it wouldn't be the only representation.

Switch to `types::TypesRef` and using the visitor pattern for value parsing.
@primoly
Copy link
Contributor Author

primoly commented May 10, 2024

Thank you for taking a look so rapidly and the suggestions.

I’ve made the following changes based on your response:

I’m now using types::TypesRef for reading the value. I don’t yet know how that will work in validation tho, because the closure provided to process_component_section can not, I believe, access that value.

I agree that the initial representation for Val isn’t great. I don’t have much experience with the visitor pattern, but I have attempted to implement something along those lines.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you got most of the bits and pieces all sorted out and this looks great to me, thanks!

@primoly
Copy link
Contributor Author

primoly commented May 18, 2024

Thank you very much for taking another look.
One thing I noticed is that the visitor would not work for anything other than validation, because it can’t share mutable state with the child visitor returned by the field/element methods.
For instance, I can’t do:

struct Printer {
    output: String,
}

/// ...

impl Record<Self> for &mut Printer {
    fn field(&mut self, name: &str) -> Self {
        self.output.push_str(name);
        self /// <- Does not work
    }
}

I tried to solve this in different attempts by adding lifetimes or changing from owned to borrowed values in both the impl and the trait itself, but every time I hit lifetime or ownership errors somewhere. Unless I miss something obvious, the whole visitor has to be redesigned from scratch, yet I have no idea how. Another solution would be to switch to have val return an iterator instead of taking a visitor but that involves some boxing and I’m not sure about the performance implications. All problems basically arise due to the recursive nature of fields/elements.

@alexcrichton
Copy link
Member

The serde crates/traits might be a good inspiration to draw from for this use case? They're solving a pretty similar problem and might be good to model after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants