Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fiber safety to __crystal_once & class_[getter|property]?(&) macros #15340

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jan 13, 2025

Fixes a couple issues:

  1. __crystal_once isn't fiber safe (concurrency issues); the initializer can be invoked multiple times from multiple fibers (despite using Mutex).

    This issue is fixed by always using the Mutex not only when MT is enabled.

  2. class_getter, class_getter?, class_property and class_property? are neither thread nor fiber safe (parallelism & concurrency issues).

    This issue is fixed by reusing Crystal.once.

NOTE: calls to the aforementioned macros had to be dropped in Fiber and Thread because Mutex depends on them and we need the later to implement said macros (chicken/egg => infinite recursion => stack overflows).

This is a breaking change because the block is now captured, and we can't return from it anymore. This is outlined by the commit that fixes SocketSpecHelper.supports_ipv6? that didn't work as expected anyway (the @@supports_ipv6 class var was never set to true). The block is no longer captured. We might want to introduce a compile time flag to enable the new behavior as it could help with some LLVM inlining behavior (inline the check).

builds on top of #15333
closes #14905

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jan 13, 2025

By duplicating the Crystal.once logic we might be able to avoid the breaking change (we let Crystal inline the blocks instead of passing a proc pointer, which avoids to capture the block). We won't be able to take advantage of the AlwaysInline + NoInline annotations optimization... but that may be acceptable.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jan 13, 2025

Wonderful. CI decided to blow up 😮‍💨

It might have something to do with musl-libc. It was an issue with musl (another thread classvar). Windows and GNU seem fine. But mingw and openbsd were broken (yet another thread class var). They're now explicitly initialized by Thread.init before we need them.

MacOS is broken for some other reason because CI still uses LLVM 11 😨 and there's an issue with LLVM 14 and below apparently (which may explain why it breaks on some older crystal releases). I can't reproduce when cross compiling to x86_64-apple-darwin on a Linux host and LLVM 18.1.8. It correctly compiles a .o file. This should be fixed: the compiler was missing a pointer cast.

The interpreter is broken because of flag.value = :processing. It's confused by the assignment through a pointer and can't translate the symbol to the enum value:

Error: BUG: missing upcast_distinct from Symbol to Crystal::OnceState (Crystal::SymbolType to Crystal::EnumType)

That was easy to fix, but now it segfaults on the IO.pipe { } interpreter spec for no immediate reason: I don't see where it would use a global and debugging the interpreter is an impossible task. It might be something to do with const or classvars used by the interpreter itself 🤔 The interpreter actually never calls __crystal_once_init and doesn't use crystal once at all.

Co-authored-by: David Keller <[email protected]>

Based on the PR by @BlobCodes:
crystal-lang#15216

The performance improvement is two-fold:

1. the usage of a i8 instead of an i1 boolean to have 3 states instead
   of 2, which permits to quickly detect recursive calls without an
   array;
2. inline tricks to optimize the fast and slow paths.

Unlike the PR:

1. Doesn't use atomics: it already uses a mutex that guarantees acquire
   release memory ordering semantics, and __crystal_once_init is only
   ever called in the main thread before any other thread is started.
2. Removes the need for a state maintained by the compiler, yet keeps
   forward and backward compatibility (both signatures are supported).
Co-authored-by: David Keller <[email protected]>

@BlobCodes: I noticed that adding this code prevents LLVM from
re-running the once mechanism multiple times for the same variable.

Modified to avoid an undefined behavior when the assumption doesn't
hold that doubles as a safety net (print error + exit).
Co-authored-by: David Keller <[email protected]>

@BlobCodes: I think it would be better to print the bug message in
`Crystal.once` instead of `__crystal_once` to reduce complexity at the
callsite. The previous unreachable method can then be used in the
inlined `__crystal_once` so LLVM also knows it doesn't have to re-run
the method.

It's now even safe because `Crystal.once` would panic if it failed; it
should already be impossible, but let's err on the safe side.
@ysbaddaden ysbaddaden force-pushed the fix/add-fiber-safety-to-crystal-once branch from d2a3376 to f4347db Compare January 14, 2025 11:58
We need a Mutex to protect against recursion and to make sure the lazy
initializers only run once, but Mutex depends on the current fiber, and
indirectly the current thread, which themselves may not have been
initialized yet and will lead to an infinite recursion once we protect
the class getter and property helpers.
Reuses the logic for `__crystal_once`.
Protects against recursion and adds thread (parallelism) and fiber
(concurrency) safety to class var initialization.
The initializer block is now captured, and we can't return from a
captured block. This outlines that the previous commit is a BREAKING
CHANGE!

The `class_getter?` version used skip over the intent to cache the
result in `@@supports_ipv6` (it returned from the generated function,
not from the block), so whenever IPv6 was supported every test was
creating yet-another TCPServer (oops).
1. it fails to translate symbol to the enum value
2. it doesn't call `__crystal_once_init`
Avoids a breaking change at the expense of some optimization.
@ysbaddaden ysbaddaden force-pushed the fix/add-fiber-safety-to-crystal-once branch from f4347db to d4fb901 Compare January 14, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensuring class_getter runs exactly once under concurrent access
1 participant