Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to portable(-ish) SIMD reads against guard pages on not-yet-supported platforms #2545

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

SoniEx2
Copy link
Collaborator

@SoniEx2 SoniEx2 commented Feb 18, 2025

Closes #2240

@SoniEx2 SoniEx2 requested a review from shravanrn February 18, 2025 15:01
@SoniEx2
Copy link
Collaborator Author

SoniEx2 commented Feb 18, 2025

(wait, is this slower than the old way? hmm... think we might still be able to do something about it tho...)

@SoniEx2
Copy link
Collaborator Author

SoniEx2 commented Feb 18, 2025

(regardless of speed, this is certainly way cleaner now.)

Copy link
Collaborator

@shravanrn shravanrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about introducing volatile here. Volatile never quite does what anyone expects in most cases and results in quite inconsistent behavior across compilers. I think the current code although messier is much safer. My suggestion is to add this only in the cases/environments where we don't have existing options working

@SoniEx2
Copy link
Collaborator Author

SoniEx2 commented Mar 17, 2025

we can understand that.

in theory, guard pages are in-bounds of the memory object as per C rules, so we are arguably not triggering UB by accessing guard pages but merely side-effects. (this is unlike, say, trying to use NULL pointers to access the zero page.) but even then, volatile access is implementation-defined. we would personally consider any compilers that simultaneously:

  1. target platforms where we can use guard pages
  2. allow us to use guard pages
  3. ignore the volatile access

as buggy. but there's a good chance the compiler would emit an additional read (and possibly a register spill) for the volatile read (especially with optimizations disabled), which isn't ideal as it degrades performance.

(there's also the question of whether the memory object itself should be considered a volatile object, such that regular accesses to it are UB - at least when guard pages are in use. we don't have a good answer to that one.)

so, considering all of that... yeah, fair enough.

(a more proper implementation would use volatile access to access the memory itself, but we don't believe there's such thing as a "volatile memcpy" we could use. after all, the value read by the memcpy isn't actually used, and that's a regular access as far as the compiler is concerned...)

@SoniEx2 SoniEx2 force-pushed the portable-simd-reads branch from 59f0efd to 132d7a0 Compare March 17, 2025 19:17
@@ -2,6 +2,9 @@
#define SIMD_FORCE_READ(var) __asm__("" ::"x"(var));
#elif defined(__GNUC__) && defined(__aarch64__)
#define SIMD_FORCE_READ(var) __asm__("" ::"w"(var));
#elif WASM_RT_MEMCHECK_GUARD_PAGES
// best-effort using volatile
#define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only with WASM_RT_MEMCHECK_GUARD_PAGES, why not make this the default fallback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use _asm__("" ::"x"(var)); on x86 too maybe?

Copy link
Collaborator

@shravanrn shravanrn Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want it as the default fallback always because it may have some performance implications (we have to pay this in the guard model for spec compliance, but no reason to pay it when using bounds checks)

We should probably put the whole thing under the guard_pages macro check tbh

#if WASM_RT_MEMCHECK_GUARD_PAGES
#  if defined(__GNUC__) && defined(__x86_64__)
#    define SIMD_FORCE_READ(var) __asm__("" ::"x"(var));
#  elif defined(__GNUC__) && defined(__aarch64__)
#    define SIMD_FORCE_READ(var) __asm__("" ::"w"(var));
#  else
    // best-effort using volatile
#    define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var;
#  endif
#else
#  define SIMD_FORCE_READ(var)
#endif

but this change would be just for clarity and would not result in perf difference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would expect SIMD_FORCE_READ to not be used at all if its not needed.

Defining it to something that doesn't actually force anything seems wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD_FORCE_READ_IF_NECESSARY?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just leave the name as is.

I do this think we should not define it to anything unless WASM_RT_MEMCHECK_GUARD_PAGES is set though.

Copy link
Collaborator Author

@SoniEx2 SoniEx2 Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would make the code way messier tho... hmm...

SIMD_CHECK_PAGE_FAULT? (or maybe SIMD_CHECK_READ?) we already have bounds-check macros that do nothing when they're not needed...

@keithw
Copy link
Member

keithw commented Mar 18, 2025

My understanding is that #2240 (specifically the failure on aarch64) was already fixed by #2406.

It seems like a minimalist option could be to enforce that every platform either uses explicit bounds-checks, or (if using guard pages) that it has a working "force read" macro. We could just bomb out if the user tries to compile with guard pages but we don't have a working force-read on that platform.

@SoniEx2
Copy link
Collaborator Author

SoniEx2 commented Mar 18, 2025

hmm, we would still like to try the volatile-based best-effort support. the tests should catch platforms where it doesn't work, then we can revisit what to do about it (probably include the bomb out for the specific platform, with an error message that instructs how to make it work). how does that sound?

@SoniEx2 SoniEx2 changed the title Switch to portable SIMD reads Fallback to portable(-ish) SIMD reads against guard pages on not-yet-supported platforms Mar 18, 2025
@shravanrn
Copy link
Collaborator

shravanrn commented Mar 18, 2025

hmm, we would still like to try the volatile-based best-effort support. the tests should catch platforms where it doesn't work, then we can revisit what to do about it (probably include the bomb out for the specific platform, with an error message that instructs how to make it work). how does that sound?

I'm not opposed to either trying it out and seeing how it fairs on different platforms, or falling back on bounds checks. I'll let @keithw and @sbc100 weigh in on their preferences.

If we do end up relying on bounds checks due to lack of force_read on some platforms, I think one nice to have would be to use bounds checks for reads but rely on guard pages for writes. This is still a non trivial performance benefit. An easy way to do this would be to adjust the force read macro to do a bounds checks (you'll probably want to rename the macro and adjust its parameters if you go this route)

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but defering to @shravanrn

// best-effort using volatile
#define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var;
#endif
#else
#define SIMD_FORCE_READ(var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to have a macro called SIMD_FORCE_READ that doesn't force a read. We're already careful not to call this in the "checked" versions of the memory access functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but:

(a) this a pre-existing condition that doesn't necessarily need to be fixed as part of this PR
(b) can you think of a better name that isn't overly long like SIMD_MAYBE_FORCE_READ or SIMD_FORCE_READ_IF_NECESSARY? Maybe we leave the bikeshedding for a separate PR/issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's new in this PR (in the current version) is that if WASM_RT_MEMCHECK_GUARD_PAGES isn't set, now SIMD_FORCE_READ will become a nop -- that's the change that I didn't super-love.

I don't think we need a SIMD_MAYBE_FORCE_READ because we're already careful only to call this in the "unchecked" versions of all the memory ops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so maybe WASM_RT_MEMCHECK_GUARD_PAGES isn't set we simply don't need to define SIMD_FORCE_READ at all?

Copy link
Collaborator Author

@SoniEx2 SoniEx2 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep in mind the "unchecked" versions are the base versions. they're used regardless of page size, so even tho custom-page-sizes does manual bounds checking, it's still additionally triggering this force-read when guard pages are enabled. (even tho custom-page-sizes doesn't/can't use guard pages, yes.)

how about we rename it to SIMD_CHECK_READ? it'd be nice to only apply this to the default32 version, but that's... difficult, currently. (also at some point we are getting too deep into C preprocessor metaprogramming, would it be a good idea to expand the template system a little bit and generate slightly more pre-expanded versions of these macros?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wasm2c/spec/simd_address.txt fails on non-x86
4 participants