Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stringconsts #1

Closed
wants to merge 553 commits into from
Closed

stringconsts #1

wants to merge 553 commits into from

Conversation

gkdn
Copy link
Owner

@gkdn gkdn commented Aug 31, 2024

brendandahl and others added 30 commits April 29, 2024 11:19
…uzzing (WebAssembly#6549)

As suggested in WebAssembly#6434 (comment) , lower ref.cast of string views
to ref.as_non_null in binary writing. It is a simple hack that avoids the
problem of V8 not allowing them to be cast.

Add fuzzing support for the last three core string operations, after which
that problem becomes very frequent.

Also add yet another makeTrappingRefUse that was missing in that
fuzzer code.
The bug that had been preventing this fuzzing no longer reproduces.
…y#6552)

When the input has branches to block scope, IR builder generally has to add a
wrapper block with a label name for the branch to target. To reduce the parsed
IR size, add a special case for when the wrapped expression is already an
unnamed block. In that case we can simply add the label to the existing block
instead of creating a new wrapper block.
Some of the example and gtest tests parse wat. Update them to use the new wat
parser, fixing problems with their text input. In one case, comment out an
outlining test entirely for now because the new parser produces different IR
that changes the test output, but it is not obvious how to understand and fix
the test.
When we concat strings, check if their length exceeds a reasonable
limit. (We do not need to do this for string.new as that reads from an
array, which is already properly limited.)

This avoids very slow pauses in the fuzzer (that sometimes OOM).
This makes the cleanup of bodies of functions that have had constants hoisted from them more effective.
Helping WebAssembly#6509, this fixes debugging support for StackIR, which makes it more
possible to use StackIR in more places.

The fix is basically just to pass around some more state, and then to call the
parent with "please write debug info" at the correct times, mirroring the
similar calls in BinaryenIRWriter.

The relevant Emscripten tests pass, and the source map test modified
here produces identical output in StackIR and non-StackIR modes (the
test is also simplified to remove --new-wat-parser which is no longer
needed, after which the test can clearly show that StackIR has the same
output as BinaryenIR).
Without this the fuzzer can error on differences in behavior between V8 and us.

Also move the limitations constants to their own header.
* Keep debug locations at function start

The `fn_prolog_epilog.debugInfo` test is failing otherwise, since there
was debug information associated to the nop instruction at the beginning
of the function.

* Do not clear the debug information when reaching the end of the source map

The last segment should extend to the end of the function.

* Propagate debug location from the function prolog to its first instruction

* Fix printing of epilogue location

The text parser no longer propagates locations to the epilogue, so we
should always print the location if there is one.

* Fix debug location smearing

The debug location of the last instruction should not smear into the
function epilogue, and a debug location from a previous function should
not smear into the prologue of the current function.
This allows writing of binaries with DWARF info when multivalue is
enabled. Currently we just crash when both are enabled together. This
just assumes, unless we have run DWARF-invalidating passes, all locals
added for tuples or scratch locals would have been added at the end of
the local list, so just printing all locals in order would preserve the
DWARF info. Tuple locals are expanded in place and scratch locals are
added at the end.
When we have a ref.func that refers to the secondary module then make a trampoline
that calls it directly. The trampoline's call is then fixed up like all direct calls to the
secondary module.
As of

https://chromium-review.googlesource.com/c/v8/v8/+/5471674

V8 requires stringviews to be non-nullable. It might be possible to make that
change in our IR, or to remove views entirely, but for now this PR makes the
fuzzer stop emitting nullable stringviews as a workaround to allow us to fuzz
current V8.

There are still rare corner cases where this pattern is emitted, that we have
not tracked down, and so this also makes the fuzzer ignore the error for now.
In that mode a walk on an entire module will reuse the same CFGWalker
instance, so we must manually clear some fields, and we forgot some
before.
…intialization (WebAssembly#6571)

Constants that need to be hoisted sometimes are initialized by calling
getters of other constants that need to be hoisted. These getters are
non-trivial, e.g.

  (func $getConst1_<once>_@X (result (ref null $A))
    (block (result (ref null $A))
      (if (i32.eqz (ref.is_null (global.get $$const1@X)))
        (then
          (return (global.get $$const1@X))
        )
      )
      (global.set $$const1@X (struct.new $A (i32.const 2)))
      (global.get $$const1@X)
    )

  (func $getConst2_<once>_@X (result (ref null $A))
    (block (result (ref null $A))
      (if (i32.eqz (ref.is_null (global.get $$const2@X)))
        (then
          (return (global.get $$const2@X))
        )
      )
      (global.set $$const2@X  .... expression with (call $getConst1_<once>_@X)  ....))
      (global.get $$const2@X)
    )

and can only be simplified after the constants they initialize are hoisted. After
the constant is hoisted the getter can be inlined and constants that depend on
it for their initialization can now be hoisted.

Before this pass, inlining would happen after the pass was run by a subsequent
run of the inliner (likely as part of -O3), requiring as many runs of this pass,
interleaved with the inliner, as the depth in the call sequence.

By having a simpler inliner run as part of the loop in this pass, the pass becomes
more effective and more independent of the call depths.
It seems like that each of the callsites already has looked up the
`Memory` object so this helper is not doing anything useful.
…embly#6568)

Previously we had passes --generate-stack-ir, --optimize-stack-ir, --print-stack-ir
that could be run like any other passes. After generating StackIR it was stashed on
the function and invalidated if we modified BinaryenIR. If it wasn't invalidated then
it was used during binary writing. This PR switches things so that we optionally
generate, optimize, and print StackIR only during binary writing. It also removes
all traces of StackIR from wasm.h - after this, StackIR is a feature of binary writing
(and printing) logic only.

This is almost NFC, but there are some minor noticeable differences:

1.  We no longer print has StackIR in the text format when we see it is there. It
    will not be there during normal printing, as it is only present during binary writing.
    (but --print-stack-ir still works as before; as mentioned above it runs during writing).
2.  --generate/optimize/print-stack-ir change from being passes to being flags that
    control that behavior instead. As passes, their order on the commandline mattered,
    while now it does not, and they only "globally" affect things during writing.
3.  The C API changes slightly, as there is no need to pass it an option "optimize" to
    the StackIR APIs. Whether we optimize is handled by --optimize-stack-ir which is
    set like other optimization flags on the PassOptions object, so we don't need the
    old option to those C APIs.

The main benefit here is simplifying the code, so we don't need to think about
StackIR in more places than just binary writing. That may also allow future
improvements to our usage of StackIR.
Tests is still very limited.  Hopefully we can use the upstream spec
tests soon and avoid having to write our own tests for
`.set/.set/.fill/etc`.

See WebAssembly/memory64#51
The spec tests use an extension of the standard text format that includes
various commands and assertions used to test WebAssembly implementations. Add a
utility to parse this extended WebAssembly script format and use it in
wasm-shell to check that it parses our spec tests without error. Fix a few
errors the new parser found in our spec tests.

A future PR will rewrite wasm-shell to interpret the results of the new parser,
but for now to keep the diff smaller, do not do anything with the new parser
except check for errors.
Change `countScratchLocals` to return the count and type of necessary scratch
locals. It used to encode them as keys in the global map from scratch local
types to local indices, which could not handle having more than one scratch
local of a given type and was generally harder to reason about due to its use of
global state. Take the opportunity to avoid emitting unnecessary scratch locals
for `TupleExtract` expressions that will be optimized to not use them.

Also simplify and better document the calculation of the mapping from IR indices
to binary indices for all locals, scratch and non-scratch.
Previously we checked late, and as a result might end up failing to optimize when
a sub-pattern could have worked. E.g.

(call
  (A)
)
(call
  (A)
)

The call cannot be optimized, but the A pattern repeats. Before this PR we'd
greedily focus on the entire call and then fail. After this PR we skip the call
before we commit to which patterns to try to optimize, so we succeed.

Add a isShallowlyGenerative helper here as we compute this step by step as
we go. Also remove a parameter to the generativity code (it did not use the
features it was passed).
Given:

(ORIGINAL)
(in between)
(COPY)

We want to change that to

(local.tee $temp (ORIGINAL))
(in between)
(local.get $temp)

It is fine if "in between" traps: then we never reach the new local.get. This is a safer
situation than most optimizations because we are not reordering anything, only
replacing known-equivalent code.
With this you can do std::cout << effects and get something like

EffectAnalyzer {
  writesMemory
  hasSideEffects
}
… text (WebAssembly#6520)

;;@

with nothing else (no source:line) can be used to specify that the following
expression does not have any debug info associated to it. This can be used
to stop the automatic propagation of debug info in the text parsers.

The text printer has also been updated to output this comment when needed.
…ssembly#6590)

I recently add TableSize/Grow and noticed I didn't need these.  It seems
they are superfluous.
…tivity (WebAssembly#6591)

WebAssembly#6587 was incorrect: It checked generativity early in an incremental manner, but
it did not accumulate that information as we do with hashes. As a result we
could end up optimizing something with a generative child, and sadly we lacked
testing for that case.

This adds incremental generativity computation alongside hashes. It also splits
out this check from isRelevant.

Also add a test for nested effects (as opposed to generativity), but that already
worked before this PR (as we compute effects and invalidation as we go, already).
tlively and others added 28 commits August 17, 2024 02:14
Most of our type optimization passes emit all non-public types as a
single large rec group, which trivially ensures that different types
remain different, even if they are optimized to have the same structure.
Usually emitting a single large rec group is fine, but it also means
that if the module is split, all of the types will need to be repeated
in all of the split modules. To better support this use case, add a pass
that can split the large rec group back into minimal rec groups, taking
care to preserve separate type identities by emitting different
permutations of the same group where possible or by inserting unused
brand types to differentiate them.
Replace code that checked `isStruct()`, `isArray()`, etc. in sequence
with uses of `HeapType::getKind()` and switch statements. This will make
it easier to find the code that needs updating if/when we add new heap
type kinds in the future. It also makes it much easier to find code that
already needs updating to handle continuation types by grepping for
"TODO: cont".
We previously printed explicit typeuses (e.g. `(type $f)`) in function
signatures when GC was enabled. But even when GC is not enabled,
function types may use non-MVP features that require the explicit
typeuse to be printed. Fix the printer to always print the explicit type
use for such types.

Fixes WebAssembly#6850.
IRBuilder is responsible for validation involving type annotations on GC
instructions because those type annotations may not be preserved in the
built IR to be used by the main validator. For `array.init_elem`, we
were not using the type annotation to validate the element segment,
which allowed us to parse invalid modules when the reference operand was
a nullref. Add the missing validation in IRBuilder and fix a relevant
spec test.
…ebAssembly#6814)

* Add interpreter support for exnref values.
* Fix optimization passes to support try_table.
* Enable the interpreter (but not in V8, see code) on exceptions.
Run the upstream tests by default, except for a large list of them that
do not successfully run. Remove the local version of those that do
successfully run where the local version is entirely subsumed by the
upstream version.
The leading bytes that indicate what kind of heap type is being defined
are bytes, but we were previously treating them as SLEB128-encoded
values. Since we emit the smallest LEB encodings possible, we were
writing the correct bytes in output files, but we were also improperly
accepting binaries that used more than one byte to encode these values.
This was caught by an upstream spec test.
Add comments to the spec test skip list briefly explaining why each
skipped spec test must be skipped.
Spec tests pass the value `ref.extern n`, where `n` is some integer,
into exported functions that expect to receive externrefs and receive
such values back out as return values. The payload serves to distinguish
externrefs so the test can assert that the correct one was returned.

Parse these values in wast scripts and represent them as externalized
i31refs carrying the payload. We will need a different representation
eventually, since some tests explicitly expect these externrefs to not
be i31refs, but this suffices to get several new tests passing.

To get the memory64 version of table_grow.wast passing, additionally fix
the interpreter to handle growing 64-bit tables correctly.

Delete the local versions of the upstream tests that can now be run
successfully.
possible-contents.h hashes the location for caught exnrefs by hashing an
arbitrary string, "caught-exnref-location". It previously used
`std::hash<const char*>` for this, but some standard library
implementations report an error when this template instantiation is used
because hashing the location of a string is almost never correct. In
this case it is fine, so switch to using `std::hash<const void*>`.
…#6861)

The best way to lower strings is via the "magic imports" API that uses
the names of imported string globals as their values. This approach only
works for valid UTF-8 strings, though. The existing
string-lowering-magic-imports pass falls back to putting non-UTF-8
strings in a JSON custom section, but this requires the runtime to
support that custom section for correctness. To help catch errors early
when runtimes do not support the strings custom section, add a new pass
that uses magic imports and raises an error if there are any invalid
strings.
WebAssembly#6859)

This is in quite ancient code, so it's a long-standing issue, but it got worse
when we enabled StackIR in more situations (WebAssembly#6568), which made it more
noticeable, I think.

For example, testing on test_biggerswitch in Emscripten, the LLVM part
is pretty slow too so the Binaryen slowdown didn't stand out hugely, but
just doing

wasm-opt --optimize-level=2 input.wasm -o output.wasm

(that is, do no work, but set the optimize level to 2 so that StackIR opts
are run) used to take 28 seconds (!). With this PR that goes down to less
than 1.
When precomputing fails on a child block of a parent block, there is no point
to precompute the parent, as that will fail as well.

This makes --precompute on Emscripten's test_biggerswitch go from 1.44
seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number
is not that big, but we do run this pass more than once, so it saves a noticeable
chunk of time.
Ensure the "fp16" feature is enabled for FP16 instructions.
Spec tests use constants like `ref.array` and `ref.eq` to assert that
exported function return references of the correct types. Support more
such constants in the wast parser.

Also fix a bug where the interpretation of `array.new_data` for arrays
of packed fields was not properly truncating the packed data. Move the
function for reading fields from memory from literal.cpp to
wasm-interpreter.h, where the function for truncating packed data lives.

Other bugs prevent us from enabling any more spec tests as a result of
this change, but we can get farther through several of them before
failing. Update the comments about the failures accordingly.
visitBlock() and validateCallParamsAndResult() both assumed they were
running inside a function, but might be called on global code too. Calls
and blocks are invalid in global positions, so we should error there, but
must do so properly without a null deref.

Fixes WebAssembly#6847
Fixes WebAssembly#6848
This constructed a LocalGraph, which computes the sets that reach each get. But
all we need to know is which params are live, so instead we can do a liveness
computation (which is just a boolean, not the list of sets). Also, it is simple to get
the liveness computation to only work on the parameters and not all the locals,
as a further optimization.

Existing tests cover this, though I did find that the case of unreachability needed
a new test.

On a large testcase I am looking at, this makes --dae 17% faster.
The parser function for `action` returned a `MaybeResult`, but we were
treating it as returning a normal `Result` and not checking that it had
contents in several places. Replace the current `action()` with
`maybeAction()` and add a new `action()` that requires the action to be
present.

Fixes WebAssembly#6872.
Before we just had a map that people would access with localGraph.getSetses[get],
while now it is a call localGraph.getSets(get), which more nicely hides the internal
implementation details.

Also rename getSetses => getSetsMap.

This will allow a later PR to optimize the internals of this API.

This is performance-neutral as far as I can measure. (We do replace a direct read
from a data structure with a call, but the call is in a header and should always get
inlined.)
Add the feature flag in V8 invocations, but also disable the feature as it
isn't quite ready yet.
Previously for in-tree builds, they were put directly into test/, which
unnecessarily pollutes the tree.
Reuse the code implementing Kahn's topological sort algorithm with a new
configuration that uses a min-heap to always choose the best available
element.

Also add wrapper utilities that can find topological sorts of graphs
with arbitrary element types, not just indices.
…#6885)

Use the new TopologicalSort and MinTopologicalSortOf utilities instead
of the old CRTP topological sort utility and a bespoke heap-based
topological sort in ReorderGlobals. Since there is no longer a heap to
pop from, the direction of the custom comparator is now much more
intuitive.

Further simplify the code by switching from tracking the new order of
globals using a sequence of new indices to tracking the order using a
sequence of old indices.

This change also makes the pass about 20% faster on a large real-world
module.
@gkdn gkdn closed this Aug 31, 2024
@gkdn gkdn deleted the stringconsts branch August 31, 2024 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.