Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to linux_reload_workaround to allow calling is_hot_reload_enabled before enable_hot_reload #1040

Conversation

KevinThierauf
Copy link
Contributor

@KevinThierauf KevinThierauf commented Feb 9, 2025

Currently linux_reload_workaround.rs stores HOT_RELOADING_ENABLED as a OnceCell, which only supports being accessed once. This means that the current implementation of is_hot_reload_enabled:

fn is_hot_reload_enabled() -> bool {
    // Assume hot reloading is disabled unless something else has been specified already. This is the better default as thread local storage
    // destructors exist for good reasons.
    // This is needed for situations like unit-tests, where we may create TLS-destructors without explicitly calling any of the methods
    // that set hot reloading to be enabled or disabled.
    *HOT_RELOADING_ENABLED.get_or_init(|| false)
}

Sets HOT_RELOADING_ENABLED for the rest of the program, and any future calls enable_hot_reload fail. Since is_hot_reload_enabled is called by thread_atexit, thread_atexit invocations that occur before enable_hot_reload is called cause a panic.
(Unrelated but similar: disable_hot_reload notes // If hot reloading is disabled then we may call this method multiple times.; the current implementation will also cause a panic if called multiple times).

I'm specifically encountering this issue with #1037; the thread_local usage leads to is_hot_reload_enabled failing.
I used some print statements in is_hot_reload_enabled and enable_hot_reload to track down the itest failures in #1037:

> /home/user/Desktop/Godot_v4.3-stable_linux.x86_64 --path itest/godot --headless -- \[\]
is_hot_reload_enabled:
   0: godot_ffi::linux_reload_workaround::is_hot_reload_enabled
             at /home/user/Desktop/gdext-panic/godot-ffi/src/linux_reload_workaround.rs:54:44
   1: godot_ffi::linux_reload_workaround::thread_atexit
             at /home/user/Desktop/gdext-panic/godot-ffi/src/linux_reload_workaround.rs:69:8
   2: __cxa_thread_atexit_impl
             at /home/user/Desktop/gdext-panic/godot-ffi/src/linux_reload_workaround.rs:93:13
   3: std::sys::thread_local::native::eager::Storage<T>::initialize
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/thread_local/native/eager.rs:47:13
   4: std::sys::thread_local::native::eager::Storage<T>::get
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/thread_local/native/eager.rs:36:40
   5: godot_core::private::ERROR_CONTEXT_STACK::{{constant}}::{{closure}}
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/thread_local/native/mod.rs:67:25
   6: core::ops::function::FnOnce::call_once
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
   7: std::thread::local::LocalKey<T>::try_with
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:282:37
   8: std::thread::local::LocalKey<T>::with
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
   9: godot_core::private::handle_panic
             at /home/user/Desktop/gdext-panic/godot-core/src/private.rs:349:5 <----- HERE
  10: godot_core::init::__gdext_load_library
             at /home/user/Desktop/gdext-panic/godot-core/src/init/mod.rs:80:22
  11: itest_init
             at /home/user/Desktop/gdext-panic/itest/rust/src/lib.rs:21:1
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: __libc_start_main
  23: <unknown>

enable_hot_reload:
   0: godot_ffi::linux_reload_workaround::enable_hot_reload
             at /home/user/Desktop/gdext-panic/godot-ffi/src/linux_reload_workaround.rs:31:40
   1: godot_ffi::linux_reload_workaround::default_set_hot_reload
             at /home/user/Desktop/gdext-panic/godot-ffi/src/linux_reload_workaround.rs:47:9
   2: godot_core::init::__gdext_load_library::{{closure}}
             at /home/user/Desktop/gdext-panic/godot-core/src/init/mod.rs:32:21
   3: core::ops::function::FnOnce::call_once
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
   4: std::panicking::try::do_call
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
   5: __rust_try
   6: std::panicking::try
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
   7: std::panic::catch_unwind
             at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
   8: godot_core::private::handle_panic
             at /home/user/Desktop/gdext-panic/godot-core/src/private.rs:355:9
   9: godot_core::init::__gdext_load_library
             at /home/user/Desktop/gdext-panic/godot-core/src/init/mod.rs:80:22
  10: itest_init
             at /home/user/Desktop/gdext-panic/itest/rust/src/lib.rs:21:1
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: __libc_start_main
  22: <unknown>

thread '<unnamed>' panicked at godot-ffi/src/linux_reload_workaround.rs:35:10:
hot reloading should only be set once: true

For reference, private.rs:349:5 is:

ERROR_CONTEXT_STACK.with(|cell| unsafe {   <--- HERE
        // SAFETY:
        // &error_context is valid for lifetime of function, and is removed from LAST_ERROR_CONTEXT before end of function
        cell.borrow_mut().push_function(&error_context)
    });

I don't know why, but for whatever reason the initialization of ERROR_CONTEXT_STACK, which is a thread local value, leads to thread_atexit being called. I briefly looked through the rust stdlib source; on 1.84.1 (which I used to compile), it's called by eager.rs.

In summary, it seems like using a thread local value on linux that's called before enable_hot_reload is called by __gdext_load_library leads to calling is_hot_reload_enabled, which leads to enable_hot_reload failing.

@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1040

@KevinThierauf KevinThierauf marked this pull request as ready for review February 9, 2025 13:50
@lilizoey
Copy link
Member

lilizoey commented Feb 9, 2025

i dont like changing this to allow modifying this setting after it has been set. it seems much safer to me to treat all thread locals one way or another, and not potentially have some be treated one way and others another.

maybe the issue is here:

let is_success = crate::private::handle_panic(ctx, init_code);

it might be possible for us to change this a bit so that hot reloading is set earlier than this. maybe we can change ExtensionLibrary::override_hot_reload to a const, like

pub trait ExtensionLibrary {
    const OVERRIDE_HOT_RELOAD: Option<bool> = None;
}

and then match on that value? this would guarantee that we cant panic during the execution of setting the hot reloading value, and ensure it's set before we call any of the panic handling machinery. which is where the issue seemed to come from

@Bromeon
Copy link
Member

Bromeon commented Feb 9, 2025

I agree with lilizoey here, as mentioned in the earlier review already:

While AtomicBool may make the issue go away, I'd rather be sure that the hot-reload-enabled flag is initialized early enough so that it's not an issue. This should be done very early, on startup. Is the whole panic handling not only set up later, or why the interference here?


it might be possible for us to change this a bit so that hot reloading is set earlier than this. maybe we can change ExtensionLibrary::override_hot_reload to a const, like

I'm wondering if anyone is using this method at all -- we could just remove it from the public API in v0.3, and use our default implemenation instead. If people complain, we can think about something.

@KevinThierauf
Copy link
Contributor Author

Alternatively, could we change:

fn is_hot_reload_enabled() -> bool {
    *HOT_RELOADING_ENABLED.get_or_init(|| false)
}

into

fn is_hot_reload_enabled() -> Option<bool> {
    *HOT_RELOADING_ENABLED.get()
}

I don't think it makes sense to set HOT_RELOADING_ENABLED in a getter if the value isn't yet initialized. Option seems like a more appropriate return value.

@Bromeon
Copy link
Member

Bromeon commented Feb 10, 2025

I don't think it makes sense to set HOT_RELOADING_ENABLED in a getter if the value isn't yet initialized. Option seems like a more appropriate return value.

Agree with the first part, not with the second.

If anyhow possible, order of initializations should be well-defined, and cause panics whenever a violation occurs. Lazy initialization for bools should be a last resort, because it can quickly lead to non-deterministic or hard-to-debug data flows.

Concretely, is_hot_reload_enabled should probably return bool and not Option<bool>, with an expect() clause. That implies the TLS configurations are invoked after that flag has been set. But the alternative is worse: unrelated code (TLS) now has to make the call on how to interpret the tri-state true|false|None, and in the None case it might make the wrong decision because the actual information about hot-reloading comes in the future.

@KevinThierauf
Copy link
Contributor Author

Unfortunately I don't see a clear way of doing this in a way that doesn't fail cargo test. As far as I know, there's no way to detect whether tests are running at runtime, and #[cfg(test)] only applies to the crate being tested, not to the libraries.

@KevinThierauf KevinThierauf deleted the linux_reload_workaround-changes branch February 17, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants