-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to portable(-ish) SIMD reads against guard pages on not-yet-supported platforms #2545
base: main
Are you sure you want to change the base?
Conversation
(wait, is this slower than the old way? hmm... think we might still be able to do something about it tho...) |
(regardless of speed, this is certainly way cleaner now.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried about introducing volatile here. Volatile never quite does what anyone expects in most cases and results in quite inconsistent behavior across compilers. I think the current code although messier is much safer. My suggestion is to add this only in the cases/environments where we don't have existing options working
we can understand that. in theory, guard pages are in-bounds of the memory object as per C rules, so we are arguably not triggering UB by accessing guard pages but merely side-effects. (this is unlike, say, trying to use NULL pointers to access the zero page.) but even then, volatile access is implementation-defined. we would personally consider any compilers that simultaneously:
as buggy. but there's a good chance the compiler would emit an additional read (and possibly a register spill) for the volatile read (especially with optimizations disabled), which isn't ideal as it degrades performance. (there's also the question of whether the memory object itself should be considered a volatile object, such that regular accesses to it are UB - at least when guard pages are in use. we don't have a good answer to that one.) so, considering all of that... yeah, fair enough. (a more proper implementation would use volatile access to access the memory itself, but we don't believe there's such thing as a "volatile memcpy" we could use. after all, the value read by the memcpy isn't actually used, and that's a regular access as far as the compiler is concerned...) |
59f0efd
to
132d7a0
Compare
@@ -2,6 +2,9 @@ | |||
#define SIMD_FORCE_READ(var) __asm__("" ::"x"(var)); | |||
#elif defined(__GNUC__) && defined(__aarch64__) | |||
#define SIMD_FORCE_READ(var) __asm__("" ::"w"(var)); | |||
#elif WASM_RT_MEMCHECK_GUARD_PAGES | |||
// best-effort using volatile | |||
#define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only with WASM_RT_MEMCHECK_GUARD_PAGES
, why not make this the default fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use _asm__("" ::"x"(var));
on x86 too maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want it as the default fallback always because it may have some performance implications (we have to pay this in the guard model for spec compliance, but no reason to pay it when using bounds checks)
We should probably put the whole thing under the guard_pages macro check tbh
#if WASM_RT_MEMCHECK_GUARD_PAGES
# if defined(__GNUC__) && defined(__x86_64__)
# define SIMD_FORCE_READ(var) __asm__("" ::"x"(var));
# elif defined(__GNUC__) && defined(__aarch64__)
# define SIMD_FORCE_READ(var) __asm__("" ::"w"(var));
# else
// best-effort using volatile
# define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var;
# endif
#else
# define SIMD_FORCE_READ(var)
#endif
but this change would be just for clarity and would not result in perf difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would expect SIMD_FORCE_READ to not be used at all if its not needed.
Defining it to something that doesn't actually force anything seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIMD_FORCE_READ_IF_NECESSARY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just leave the name as is.
I do this think we should not define it to anything unless WASM_RT_MEMCHECK_GUARD_PAGES
is set though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would make the code way messier tho... hmm...
SIMD_CHECK_PAGE_FAULT? (or maybe SIMD_CHECK_READ?) we already have bounds-check macros that do nothing when they're not needed...
My understanding is that #2240 (specifically the failure on aarch64) was already fixed by #2406. It seems like a minimalist option could be to enforce that every platform either uses explicit bounds-checks, or (if using guard pages) that it has a working "force read" macro. We could just bomb out if the user tries to compile with guard pages but we don't have a working force-read on that platform. |
hmm, we would still like to try the volatile-based best-effort support. the tests should catch platforms where it doesn't work, then we can revisit what to do about it (probably include the bomb out for the specific platform, with an error message that instructs how to make it work). how does that sound? |
I'm not opposed to either trying it out and seeing how it fairs on different platforms, or falling back on bounds checks. I'll let @keithw and @sbc100 weigh in on their preferences. If we do end up relying on bounds checks due to lack of force_read on some platforms, I think one nice to have would be to use bounds checks for reads but rely on guard pages for writes. This is still a non trivial performance benefit. An easy way to do this would be to adjust the force read macro to do a bounds checks (you'll probably want to rename the macro and adjust its parameters if you go this route) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but defering to @shravanrn
// best-effort using volatile | ||
#define SIMD_FORCE_READ(var) (void)*(volatile v128*)&var; | ||
#endif | ||
#else | ||
#define SIMD_FORCE_READ(var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to have a macro called SIMD_FORCE_READ that doesn't force a read. We're already careful not to call this in the "checked" versions of the memory access functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed but:
(a) this a pre-existing condition that doesn't necessarily need to be fixed as part of this PR
(b) can you think of a better name that isn't overly long like SIMD_MAYBE_FORCE_READ
or SIMD_FORCE_READ_IF_NECESSARY
? Maybe we leave the bikeshedding for a separate PR/issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's new in this PR (in the current version) is that if WASM_RT_MEMCHECK_GUARD_PAGES isn't set, now SIMD_FORCE_READ will become a nop -- that's the change that I didn't super-love.
I don't think we need a SIMD_MAYBE_FORCE_READ
because we're already careful only to call this in the "unchecked" versions of all the memory ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so maybe WASM_RT_MEMCHECK_GUARD_PAGES isn't set we simply don't need to define SIMD_FORCE_READ
at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep in mind the "unchecked" versions are the base versions. they're used regardless of page size, so even tho custom-page-sizes does manual bounds checking, it's still additionally triggering this force-read when guard pages are enabled. (even tho custom-page-sizes doesn't/can't use guard pages, yes.)
how about we rename it to SIMD_CHECK_READ
? it'd be nice to only apply this to the default32 version, but that's... difficult, currently. (also at some point we are getting too deep into C preprocessor metaprogramming, would it be a good idea to expand the template system a little bit and generate slightly more pre-expanded versions of these macros?)
Closes #2240