Skip to content

Conversation

@feedab1e
Copy link
Contributor

This PR proposes a vendor-neutral syntax for describing relocation information in WAT.

Unlike the syntax described in WebAssembly/wabt#2649, this proposal is intended to express everything that the binary format can, including whatever the current proposed implementation in WABT does not support. Of note here is the fact that @sym annotations can appear multiple times per declaration, the inclusion of COMDATs, segment infos, and the inclusion of labels and addends.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to me. Good to see the annotation proposal being used in innovative ways like this!

Linking.md Outdated
| `weak` | sets `WASM_SYM_BINDING_WEAK` symbol flag |
| `static` | sets `WASM_SYM_BINDING_LOCAL` symbol flag |
| `hidden` | sets `WASM_SYM_VISIBILITY_HIDDEN` symbol flag |
| `retain` | sets `WASM_SYM_NO_STRIP` symbol flag |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think retain is a better terminology then we should probably propose renaming WASM_SYM_NO_STRIP rather than diverging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's sound. I just thought that those names are from LLVM's source, so that rename should be done in sync with LLVM

@sbc100
Copy link
Member

sbc100 commented Oct 18, 2025

For comparison I wonder if you could post a simple hello_world.o object file in both .s assembly format and the .wat format. We should probably almost mention that LLVM uses its own .s format somewhere here too, but that can be separate PR if you like.

@feedab1e
Copy link
Contributor Author

For comparison I wonder if you could post a simple hello_world.o object file in both .s assembly format and the .wat format. We should probably almost mention that LLVM uses its own .s format somewhere here too, but that can be separate PR if you like.

I can do that for WAT, but since I don't really know the structure of .s, and there seems to be no documentation for it, I don't think that I can do it with .s.

@feedab1e
Copy link
Contributor Author

On second thought, I can just use clang -S for .s and clang -c wasm2wat for .wat, so I can do that, yeah

@feedab1e
Copy link
Contributor Author

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

@sbc100
Copy link
Member

sbc100 commented Oct 19, 2025

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

Can you just include it here as quoted text? Along with the equivalent wat for comparison?

@sbc100
Copy link
Member

sbc100 commented Oct 19, 2025

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

Agreed, Im mostly curious to see the two side by side here in the comments for comparison/discussion.

@feedab1e
Copy link
Contributor Author

feedab1e commented Oct 19, 2025

Oh, sure, then, for the source file test.cpp, when compiled with clang -S -O3, yields test.s, and when compiled clang -c -O3, and than run through wasm2wat -r --enable-annotations (provided wabt from my fork), yields test.wat.

(Unfortunately, github forbids me from posting .s and .wat, so .txt it is)

@sbc100
Copy link
Member

sbc100 commented Oct 19, 2025

Oh, sure, then, for the source file test.cpp, when compiled with clang -S -O3, yields test.s, and when compiled clang -c -O3, and than run through wasm2wat -r --enable-annotations (provided wabt from my fork), yields test.wat.

(Unfortunately, github forbids me from posting .s and .wat, so .txt it is)

Oh nice, the wat format looks much more readable than I expected it to be.

@feedab1e
Copy link
Contributor Author

feedab1e commented Oct 19, 2025

One thing I wanted to mention here is that there is a limitation in the text format, that code section relocations that do not make sense for instruction operands may not be expressed in the text format. So for example i32.load instruction with a R_WASM_TYPE_INDEX_LEB could occur in a perfectly valid object file, from binary format's PoV, yet would be inexpressible in the text format. Do we want to disallow such relocations in the binary format too?
(AFAIK this would also be a problem for .s)

@feedab1e
Copy link
Contributor Author

Also the same issue exists for labels, which can only point between instructions in the code segment, or into the data area of a data segment in the data section

@sbc100
Copy link
Member

sbc100 commented Oct 19, 2025

One thing I wanted to mention here is that there is a limitation in the text format, that code section relocations that do not make sense for instruction operands may not be expressed in the text format. So for example i32.load instruction with a R_WASM_TYPE_INDEX_LEB could occur in a perfectly valid object file, from binary format's PoV, yet would be inexpressible in the text format. Do we want to disallow such relocations in the binary format too? (AFAIK this would also be a problem for .s)

Yes it seems perfectly reasonable for an object file validator to declare such relocations as invalid based on the instructions they are part of.

However, the linker itself (wasm-ld) will blindly accept such things, and likely produce invalid output as a result too. The linker explictly does not parse the code section but instead blindly applies relocation. The same goes for relocation that don't point to a correct spot in the instruction stream, e.g. a relocation could in theory point at the i32.load opcode itself, rather than its operand. This would technically be an invalid object files, but the linker is blind to that.

@feedab1e
Copy link
Contributor Author

Would you be fine with me adding into the doc that such relocations are invalid for the purposes of validation, then?
And same for R_WASM_SECTION_OFFSET_I32 addends.
Also, do we want to specify that section entries in relocation tables may only point to the data section, the code section, or custom sections?

@sbc100
Copy link
Member

sbc100 commented Oct 19, 2025

Would you be fine with me adding into the doc that such relocations are invalid for the purposes of validation, then?

Sure that makes sense to me. Relocation that don't point to a valid spot in the instruction stream are certainly invalid. It might be worth also noting that wasm-ld does not do validation of the code section though, so bad inputs can result in bad outputs.

And same for R_WASM_SECTION_OFFSET_I32 addends. Also, do we want to specify that section entries in relocation tables may only point to the data section, the code section, or custom sections?

Sounds reasonable yes, we could always expand the list, but for now those are the only sections that are copied by the linker from input files into the output file, so they are the only section for which relocations make sense. I think for GC types we maybe want to one day include the type section somehow, but we are long way from that.

@feedab1e
Copy link
Contributor Author

Added the validation rules, please take a look

Linking.md Outdated
| ------------ | -------------- | ------------------------------------------- |
| section | `varuint32` | the index of the target section |

Section symbols may only reference the CODE section, the DATA section, or custom sections.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. I'm not sure about this actually.

When you asked about documenting a limitation on sections I thought you were referring to the fact that relocations can only apply to certain section types.

"Which sections can have relocations within them" is a different concept to "which sections can be referred to by WASM_SYMBOL_TYPE_SECTION symbols".

I believe that WASM_SYMBOL_TYPE_SECTION symbols are only used by debug info, but my memory is a little foggy here.

Looking at the code it actually looks like this symbols might only be valid for custom sections: https://github.com/llvm/llvm-project/blob/38372df53fd7f6c8bd8c46bf720b676e12f481d9/lld/wasm/InputFiles.cpp#L697-L705.

Which would make sense if these only used in debug info since all debug info is stored in custom sections.

Copy link
Contributor Author

@feedab1e feedab1e Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it can work like that, though, since R_WASM_SECTION_OFFSET_I32 relocations reference a section symbol, and for that to work as DWARF code addresses, the symbol that relocation references would have to reference the CODE section, while the relocation itself would have to target a place in the debug section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually, that can absolutely work if WASM_*_OFFSET_* relocations actually resolve to offsets form the file start, like DWARF actually expects. I do think the current spec is not very clear on this and someone from LLVM should take a look at what actually happens there and adjust https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#processing-relocations accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the text format to reflect that section offset relocations may only reference custom sections for now.

@feedab1e
Copy link
Contributor Author

By "default" do you mean "if no linking annotations are present in the file" (i.e. not creating an object file) or do you mean just silently not generating a relocation there?

@feedab1e
Copy link
Contributor Author

I guess my actual question is how would you diagnose the following:

(module
  (func (param) (result))
  (func (@sym) (param) (result)))

It is either

  1. "missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)
  2. "(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.

@sbc100
Copy link
Member

sbc100 commented Oct 22, 2025

I guess my actual question is how would you diagnose the following:

(module
  (func (param) (result))
  (func (@sym) (param) (result)))

It is either

  1. "missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)
  2. "(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.

How about we require @relocatable on the module itself?

@alexcrichton
Copy link
Collaborator

Could handling func-with-@sym plus func-without-@sym be part of the validation of the linking section? Where validation requries that if any entity has @sym then they all have @sym? And then dealing with the text format of invalid modules is expected to return an error of some kind?

@feedab1e
Copy link
Contributor Author

Could handling func-with-@sym plus func-without-@sym be part of the validation of the linking section? Where validation requries that if any entity has @sym then they all have @sym? And then dealing with the text format of invalid modules is expected to return an error of some kind?

Sure, but how would you implement those diagnostics? The parser state is gone by the time you do object file validation, as well as source locations for the declarations.

@feedab1e
Copy link
Contributor Author

I guess my actual question is how would you diagnose the following:

(module
  (func (param) (result))
  (func (@sym) (param) (result)))

It is either

  1. "missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)
  2. "(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.

How about we require @relocatable on the module itself?

That would work, too. Actually, then, I think it would be better to have an annotation describing all WAT features that are enabled for the module.
Something like:

(module (@wat-features linking code-metadata dwarf))

For object file linking specifically, another option is to say something like

(module (@target-features +overlong-leb -multimemory))

to trigger object file generation, since those make sense only for linking, AFAIK

@feedab1e
Copy link
Contributor Author

@sbc100 @alexcrichton Hi, do you have any more comments? if not, I’d like to merge this soon, since my GCC codegen backend can already compile programs, but it needs spec to be reviewable

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still need to resolve the issue of how wat reader knows if it should produce a relocatable output at all?

My current preference is for a global annotation on the top level module, but I don't see that reflected in the current PR.

I'm also still not sure about the implicit relocation for things like call $foo .. does that currently generate an implicit relocation? I still think it might be good to remove all implicit relocation for v1 of the spec, although the addition of the the level annotation on the module object makes me feel a little better about it.

I'm curious if other folks have opinions about implicit relocations. @dschuff maybe? @sunfishcode @tlively ? I guess it kind of matches assembly formats where call foo is an implicit relocation.

| Field | Type | Description |
| ------------ | -------------- | ------------------------------------------- |
| index | `varuint32` | the index of the Wasm object corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set |
| index | `varuint32` | the index of the WebAssembly entity corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we Wasm consistently? So Wasm entity instead of `WebAssembly entity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can do that


## Relocations

Relocations are represented as WebAssembly annotations of the form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here? Should we just use Wasm?

|--------------|---------------------------------------|-------------------|
| nothing | nothing | Normal relocation |
| `pic` | `R_WASM_*_LOCREL_*`, `R_WASM_*_REL_*` | Address relative to `env.__memory_base` or `env.__table_base`, used for dynamic linking |
| `tls` | `R_WASM_*_TLS*` | Address relative to `env.__tls_base`, used for thread-local storage |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to reflect the entire list of relocation types like they are listed in the binary format and/or in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/WasmRelocs.def

i.e. why create this new concept of a base type + a modifier that doesn't exist elsewhere yet? Why not just use type=R_WASM_EVENT_INDEX_XX in the text format? This would also make the format redundant since its also part of the name of the relocation type.

Maybe this new method/format/modifier concept could be added more globally later once the initial version of the text format is added? But for v1 it seems like it would make sense to simply mirror the existing binary format enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was covered extensively in #258 (comment), and @alexcrichton expressed support for it here, but in short, that way there wouldn't be an option to elide parts of the relocation annotation (i.e. defaulting and predefinig wouldn't work), so all relocations would be incredibly verbose (for example, call $foo would become call $foo (@reloc type=R_WASM_FUNC_INDEX_LEB) for no reason).

Copy link
Member

@sbc100 sbc100 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't see how specifying the full relocation type (e.g using R_WASM_FUNC_INDEX_LEB when a reloc is present) would prevent the whole relocation from being implicit / elided.

This seem like two orthogonal decisions, but I get that I must be missing something:

  1. Do we implicitly generate reloc entries for things like call instruction?
  2. When reloc is specified explicitly do we use the existing enum, of something new/different

I'm also not sure that reducing verbosity needs to be the highest priority since the plan is for this format to be mostly machine read and machine written, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't see how specifying the full relocation type (e.g using R_WASM_FUNC_INDEX_LEB when a reloc is present) would prevent the whole relocation from being implicit / elided.

This seem like two orthogonal decisions, but I get that I must be missing something:

  1. Do we implicitly generate reloc entries for things like call instruction?
  2. When reloc is specified explicitly do we use the existing enum, of something new/different

Apart form fully elidable relocations, other types of relocations exist, like in memory (i32.load (@reloc $mem_sym)) and in constants (i32.const (@reloc data $mem_sym)) where a relocation is not entirely elided but is greatly abbreviated form the complete relocation type. Apart form that, specifying the complete relocation type would expose relocation type names (which are for now an implementation detail of LLVM) to the wider text format.

I'm also not sure that reducing verbosity needs to be the highest priority since the plan is for this format to be mostly machine read and machine written, right?

Well, it needs to be human-readable, too, since it's a text format and humans are expected to read that too, like they usually read assembly, and likewise human-writable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart form that, specifying the complete relocation type would expose relocation type names (which are for now an implementation detail of LLVM) to the wider text format.

The relocation type names are not indented to be LLVM specific. The list of 20 relocation types, along with their ffull names, are listed above in this very document.

This is designed to mirror the ELF relocation types that are defined in the ELF header and not specific to either LLVM or GCC but are using in both place.

I think it might be a good idea to reflect this precisely in text for, so we can avoid having two different ways to specify things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's more of a preference of per-relocation/symbol annotations over a top-level annotation. I agree that either system would work alright, and the main concern that I can think of is printing an wasm object file (binary-to-text) where it feels more natural to print @reloc or @sym-per-entry as opposed to verifying that everything has a relocation or symbol and then not printing anything. For a pure text-to-binary use case I'd agree that the top-level annotation is nicer to have.

I'd naively expect that with @sym and @reloc would be frequent enough in a file that it wouldn't take much of a visual scan myself, but I'm not personally too concerned about that myself.

I should also be clear that I'm happy to be overruled here. IMO text-format design is something that's worth bikeshedding but not endlessly, so I wouldn't want to hold up anything on my own behalf too much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still not sure there’s much value in either making all symbols/relocs explicit or having a toplevel annotation, since neither actually guarantee that an object file is produced. So the only effect of such an annotation would be to artificially restrict files that can be relocatable. Currently, any valid WASM module as well as any valid WAT module can be transformed into an object file and back by merely attaching or stripping the linking information. It's a nice property and I would like to keep it, especially since there is nothing stopping us from it.

If we really really want something (advisory) that says which features are intended to be used with which WAT files, then perhaps we should have a general annotation that applies to all features, not just linking.

That annotation, if decided upon, could in principle augment assembler flags same way as -DMACRO/#define MACRO works in C and C++ compilers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a separate concern of how explicit relocation annotations will fit in with other metadata that isn't aware of linking, like code metadata. Currently it's easy, code metadata info needs a funcidx, so it grabs the primary symbol for that funcidx, and places a relocation of that symbol, no annotations required.

If explicit relocations are mandated, then the standardized syntax for code metadata would simply no longer work, just like every annotation-based metadata feature that isn't explicitly aware of relocations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might come down to how I'm approaching this from a different perspective (I think) than you might be. I'm thinking about this from a tooling perspective of a text-to-binary or binary-to-text transform. I suspect you're coming from the direction of producing-the-text and/or reading-the-text (please correct me though!). I'm concerned with "the binary should have one canonical text form and vice versa" and I'm not interested in auto-injecting linking/reloc.* sections myself (e.g. via some sort of CLI flag and/or configuration outside of the text file).

Not sure if that helps, but figured I'd write down how my perspective may be a bit different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might come down to how I'm approaching this from a different perspective (I think) than you might be. I'm thinking about this from a tooling perspective of a text-to-binary or binary-to-text transform. I suspect you're coming from the direction of producing-the-text and/or reading-the-text (please correct me though!). I'm concerned with "the binary should have one canonical text form and vice versa" and I'm not interested in auto-injecting linking/reloc.* sections myself (e.g. via some sort of CLI flag and/or configuration outside of the text file).

Not sure if that helps, but figured I'd write down how my perspective may be a bit different.

Well, from the transform PoV, mandating the annotations would actually break that property, since there would be binary object files that can no longer be converted to text (#258 (comment)), so our hands are tied, and we have to make this work with (mandatory) elision, or we have to redesign every other annotation to accommodate linking. Elision does not break the round-trip requirement either, since elided relocations have a unique and mandatory representation in a valid binary object file, and vice versa

@feedab1e
Copy link
Contributor Author

I'm also still not sure about the implicit relocation for things like call $foo .. does that currently generate an implicit relocation? I still think it might be good to remove all implicit relocation for v1 of the spec, although the addition of the the level annotation on the module object makes me feel a little better about it.

As I explained here, I do think that making all relocations explicit would be a very bad idea, since that would just make the text format excessively cluttered for no gain at all, since those relocations are mandatory and their properties can be trivially inferred.

@sunfishcode
Copy link
Member

As I explained here, I do think that making all relocations explicit would be a very bad idea, since that would just make the text format excessively cluttered for no gain at all, since those relocations are mandatory and their properties can be trivially inferred.

This sounds reasonable to me.

@feedab1e
Copy link
Contributor Author

Hi, @dschuff proposed to move this PR forward. As far as I'm aware, the only issue left now is the tension between being explicit about relocations and compatibility with other metadata annotations. The crux of the issue is that, while interest exists for being able to discern at a glance whether the text given is relocatable or not, it cannot be done by requiring a relocation annotation to be present for each relocation entry.
The following example demonstrates this impossibility:

(module
  (func (@name "foo") (@sym)))

The issue here is that name annotations (as well as many others) create implicit references to WebAssembly entities which are not apparent in the text, so there is simply no place to attach a relocation annotation to that annotation, and since we require a relocation entry to be present for each reference to a WebAssembly entity, every program that uses such extensions would always fail object file validation. There are two ways of dealing with this issue:

  1. tool-conventions has to keep track of and update every metadata annotation (even outside of tool-conventions) to be aware of linking metadata, and mandate explicit relocation annotations within those annotations, or
  2. the linking format has to take into account the existence of other annotations, and allow generation of implied relocation entries to preserve compatibility.

Given the intrusivity of option 1, I find it impractical, and strongly advise against it and in favor of option 2.
As for differentiation for option 2, I think it would be adequate to just have a toplevel annotation as described in #258 (comment).

@sbc100 @alexcrichton @dschuff, do you have any other comments?

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

As for differentiation for option 2, I think it would be adequate to just have a toplevel annotation as described in #258 (comment).

I'm pretty strongly in favor of this top-level annotation.

@feedab1e
Copy link
Contributor Author

As for differentiation for option 2, I think it would be adequate to just have a toplevel annotation as described in #258 (comment).

I'm pretty strongly in favor of this top-level annotation.

That comment describes two potential options, one works for all features, and the other one works only for linking. If we opt for the first option, then I think creating a separate document describing it would be best. Would you like me to do that?

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

Hi, @dschuff proposed to move this PR forward. As far as I'm aware, the only issue left now is the tension between being explicit about relocations and compatibility with other metadata annotations. The crux of the issue is that, while interest exists for being able to discern at a glance whether the text given is relocatable or not, it cannot be done by requiring a relocation annotation to be present for each relocation entry. The following example demonstrates this impossibility:

(module
  (func (@name "foo") (@sym)))

The issue here is that name annotations (as well as many others) create implicit references to WebAssembly entities which are not apparent in the text, so there is simply no place to attach a relocation annotation to that annotation, and since we require a relocation entry to be present for each reference to a WebAssembly entity, every program that uses such extensions would always fail object file validation. There are two ways of dealing with this issue:

  1. tool-conventions has to keep track of and update every metadata annotation (even outside of tool-conventions) to be aware of linking metadata, and mandate explicit relocation annotations within those annotations, or
  2. the linking format has to take into account the existence of other annotations, and allow generation of implied relocation entries to preserve compatibility.

Given the intrusivity of option 1, I find it impractical, and strongly advise against it and in favor of option 2. As for differentiation for option 2, I think it would be adequate to just have a toplevel annotation as described in #258 (comment).

@sbc100 @alexcrichton @dschuff, do you have any other comments?

Right now we only support relocations 3 section types code, data and custom (debug) sections. Specifically there is no support for relocation in a name section since the linker considers it synthetic (i.e. it creates it from scratch, not from concatenating inputs). I think I get what you are hinting at, but perhaps we could motivate with an example where relocations are actually needed/used within custom sections produced by other annotations?

In general, I don't think its going to be possible for all core WAT features, lets alone all possible annotations, to survive relocation (i.e be used in relocatable object files). For example, as of today, we don't have any support for GC types, or type section relocations, so any wat file that uses GC types cannot be relocatable. I think its reasonable to say that relocatable objects can only use a certain limited feature set, at least that is true today for the primary consumer of object files: wasm-ld.

@alexcrichton
Copy link
Collaborator

alexcrichton commented Nov 19, 2025

Personally I'm a bit lost insofar as I wouldn't know where to begin to implement this in wasm-tools. That being said I've offered my 2c here so I don't want to block this or anything. I'd probably wait to see what other toolchains do to figure out how to implement this in wasm-tools, and that's not the end of the world.

@feedab1e
Copy link
Contributor Author

Right now we only support relocations 3 section types code, data and custom (debug) sections. Specifically there is no support for relocation in a name section since the linker considers it synthetic (i.e. it creates it from scratch, not from concatenating inputs).

Isn't the name section also a custom section? The spec says it is. There's nothing to synthesize that data from, either, since e.g. names for locals aren't known to the linker, so cannot be synthesized. If you do actually drop some custom sections, that would be nice to specify in the spec.

I think I get what you are hinting at, but perhaps we could motivate with an example where relocations are actually needed/used within custom sections produced by other annotations?

Aside from this example, which I would assume is still valid, I would also point out code metadata (which is also a custom section), which does similar implied references, but only to function entities. If annotations for DWARF will be necessary, they will also follow the same idiom of implicitly referencing entities they are located in.

In general, I don't think its going to be possible for all core WAT features, lets alone all possible annotations, to survive relocation (i.e be used in relocatable object files). For example, as of today, we don't have any support for GC types, or type section relocations, so any wat file that uses GC types cannot be relocatable. I think its reasonable to say that relocatable objects can only use a certain limited feature set, at least that is true today for the primary consumer of object files: wasm-ld.

I see no reason stopping us from supporting as many features as possible, especially if there's no additional per-feature cost for doing so. I also wouldn't limit the specification to the limits of the current implementation, since one of the primary purposes of a specification is to allow multiple implementations to exist. I also don't think it's reasonable to limit the set of allowed features using this repository, and instead we should stride for seamless compatibility with as much of the ecosystem as possible.

In case of GC types, I don't think there is actually anything to be done on the format side, since the linker is required to grok the type section itself, and merge the type declarations, and it already does so for function types, but that's probably off-topic for this discussion.

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

Right now we only support relocations 3 section types code, data and custom (debug) sections. Specifically there is no support for relocation in a name section since the linker considers it synthetic (i.e. it creates it from scratch, not from concatenating inputs).

Isn't the name section also a custom section? The spec says it is. There's nothing to synthesize that data from, either, since e.g. names for locals aren't known to the linker, so cannot be synthesized. If you do actually drop some custom sections, that would be nice to specify in the spec.

The name section is a custom section buts it has special case handling. There are basically two types of sections in the linker:

Type 1: Inputs get concatenated + relocations applied

Type 2: Outputs are calculated and produced directly by the linker (input sections not not included verbatim). These are called synthetic sections in the linker: https://github.com/llvm/llvm-project/blob/main/lld/wasm/SyntheticSections.h

The name section kind of has to be of Type 2 in that we do not copy it verbatim, but instead include only the names of the elements we include in the link.

As of today Type 1 sections are code and data and generic custom sections (other than names, target features, and other specially handled custom sections).

Also, note that not all custom sections can work as Type 1 sections since they cannot be concatenated. For example neither the name section nor the target-features section are can work like this.

@feedab1e
Copy link
Contributor Author

Right now we only support relocations 3 section types code, data and custom (debug) sections. Specifically there is no support for relocation in a name section since the linker considers it synthetic (i.e. it creates it from scratch, not from concatenating inputs).

Isn't the name section also a custom section? The spec says it is. There's nothing to synthesize that data from, either, since e.g. names for locals aren't known to the linker, so cannot be synthesized. If you do actually drop some custom sections, that would be nice to specify in the spec.

The name section is a custom section buts it has special case handling. There are basically two types of sections in the linker:

Type 1: Inputs get concatenated + relocations applied

Type 2: Outputs are calculated and produced directly by the linker (input sections not not included verbatim). These are called synthetic sections in the linker: https://github.com/llvm/llvm-project/blob/main/lld/wasm/SyntheticSections.h

The name section kind of has to be of Type 2 in that we do not copy it verbatim, but instead include only the names of the elements we include in the link.

What happens to relocations in Type 1 sections if they reference symbols that are not included in the link?

In general, I think it would be a good idea, then, to mark Type 1 sections in the binary format explicitly, so that the linker knows it's safe to just concatenate them, as opposed to silently merging unknown sections in a wrong way and then getting a messed up binary. This will probably be needed for LTO too, to tell an incapable linker that it's safe to just throw out sections containing LTO info.

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

What happens to relocations in Type 1 sections if they reference symbols that are not included in the link?

They turn into tombstone values, the exact value of which depends on the type of section: https://github.com/llvm/llvm-project/blob/be955e5ac9a0b0c979221efb28b0f52aad7bd3d6/lld/wasm/InputChunks.cpp#L578-L595

@feedab1e
Copy link
Contributor Author

What happens to relocations in Type 1 sections if they reference symbols that are not included in the link?

They turn into tombstone values, the exact value of which depends on the type of section: https://github.com/llvm/llvm-project/blob/be955e5ac9a0b0c979221efb28b0f52aad7bd3d6/lld/wasm/InputChunks.cpp#L578-L595

Oh, wow! So, if in my custom section I have a relocation for a function entity that gets deleted, I just get the first function?

I myself would honestly expect relocations to mark the symbols live, so that never happens, and that's how I read the spec initially, but given this I'd like at least some documentation on that.

@dschuff
Copy link
Member

dschuff commented Nov 19, 2025

I also wouldn't limit the specification to the limits of the current implementation, since one of the primary purposes of a specification is to allow multiple implementations to exist. I also don't think it's reasonable to limit the set of allowed features using this repository, and instead we should stride for seamless compatibility with as much of the ecosystem as possible.

The behavior of implementations is just as important (if not more so) than what the spec says. If the behavior of one implementation is a superset of another than the spec should certainly include both, but I don't think the spec should get out ahead of implementations in terms of specifying behavior that there is no implementation for. (this is actually IMO best for "seamless compatibility with as much of the ecosystem"; compatibility only matters between implementations, and spec is a means to that end). Obviously, having syntax that is general and doesn't constrain future extensions as much as possible is good, but I think the right time to add a feature to the spec is when some implementation wants to implement it.


Relocations are represented as WebAssembly annotations of the form
```wat
(@reloc <format> <method> <modifier> <symbol-reference> <addend>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with changing the name of R_WASM_TABLE_INDEX_ to something else. The names don't currently show up for LLVM users anywhere other than error messages, and I'd be willing to change them if we come up with better names here.
TABLE_ELEM seems OK to me (or maybe ELEM_IDX / TABLE_ELEMIDX / TABLE_ELEM_IDX since the spec calls this index an elemidx?).
Currently we use OFFSET in reloc names for byte offsets in the binary itself, so I think something using ELEM rather than OFFSET would be better.

A usage of a more general non-function table element would probably need one relocation for the table index and another for the element. So it seems the processing of the second reloc might need to be context-dependent in the same way you are proposing for instructions.

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

What happens to relocations in Type 1 sections if they reference symbols that are not included in the link?

They turn into tombstone values, the exact value of which depends on the type of section: https://github.com/llvm/llvm-project/blob/be955e5ac9a0b0c979221efb28b0f52aad7bd3d6/lld/wasm/InputChunks.cpp#L578-L595

Oh, wow! So, if in my custom section I have a relocation for a function entity that gets deleted, I just get the first function?

I myself would honestly expect relocations to mark the symbols live, so that never happens, and that's how I read the spec initially, but given this I'd like at least some documentation on that.

Relocation in the data and text sections do mark the target as live (exception for weak references). I.e. tombstone values are only written to code and data when weak references are missing.

Relocation in custom section are mostly used for .debug info... where you would clearly not want this behaviour.

@dschuff
Copy link
Member

dschuff commented Nov 19, 2025

Oh, wow! So, if in my custom section I have a relocation for a function entity that gets deleted, I just get the first function?

I myself would honestly expect relocations to mark the symbols live, so that never happens, and that's how I read the spec initially, but given this I'd like at least some documentation on that.

This is definitely not the behavior you want for e.g. debug info though. In general you want the property that you can strip it without changing program semantics.

@feedab1e
Copy link
Contributor Author

Oh, wow! So, if in my custom section I have a relocation for a function entity that gets deleted, I just get the first function?

I myself would honestly expect relocations to mark the symbols live, so that never happens, and that's how I read the spec initially, but given this I'd like at least some documentation on that.

This is definitely not the behavior you want for e.g. debug info though. In general you want the property that you can strip it without changing program semantics.

I suppose, then, that there needs to be something like SEGMENT_INFO subsection, but for custom sections, which would describe whether its references are strong or weak, what tombstone values should be used, and if any special actions have to be taken during linking. Does that sound good?

@sbc100
Copy link
Member

sbc100 commented Nov 19, 2025

I suppose, then, that there needs to be something like SEGMENT_INFO subsection, but for custom sections, which would describe whether its references are strong or weak, what tombstone values should be used, and if any special actions have to be taken during linking. Does that sound good?

That sounds like it could be useful yes, but I'd prefer to wait for a specific use cases to drive new features like this in the linker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants