Skip to content

Conversation

@MAG-mv
Copy link
Contributor

@MAG-mv MAG-mv commented Sep 26, 2025

CSP sometimes fires certain callbacks on a different thread the function was invoked. This causes crashes in WASM builds due to it not supporting this behaviour.

I decided to fix this at the binding level (wrapper generator), as we only need to fix this on wasm, and not anywhere else. Also, creating a callback queue that needs to be polled may change expected behavior, as clients would need to call CSPFoundation::Tick before logging in to get the callback.

I used the emscripten api to create a callback queue, which pushes callbacks onto the main thread if we are not on main. Doing it here also means we don't have to poll, as Emscripten resolved the inter-thread messaging for us using worker-thread messaging.

I also fixed a small logging issue in the NetworkInterruptionCallback, as we were always logging, even if it was a regular disconnect.

I was unfortunatly unable to reproduce the NetworkInterruptionCallback crash, both via testing and using a web build. Our testing framework crashes when trying to block to wait for the callback, so it isn't easily testable. Also, using a web build showed the callback logs, but no crash.

I also tested my new changes on a local web build to make sure there were no obvious issues/

This creates a callback queue using the emscripten api to be able to push callbacks to the main thread if we are not on that thread
Logic is now reversed now that we have fixed this test
We aere alwaysd logging that the connection was interrupted, even for a regular disconnect
@github-actions
Copy link

🚀 Pull Request Review Guidelines

Thank you for taking the time to review this Pull Request. The following is a summary of our Pull Request guidelines. The full guidelines can be found here.

💬 How to Provide Feedback

We use a comment ladder when leaving review comments to avoid any ambiguity.

Tag Is response required? Is a change required? May the PR author resolve?
[Fix] ✔️ ✔️
[Consider] ✔️ ✔️
[Question] ✔️
[Nit] ✔️
[Comment] ✔️

All comments should be prefixed with one of the above tags, for example:
[fix] Your editor is still set to use tabs instead of spaces, or this file is old and needs to be reformatted.

⚠️ Reviewers should be sure to resolve conversations if they feel their feedback has been addressed sufficiently.

🎯 PR Author Focus Areas

  • Link Commits to Conversations After making a change in response to a reviewer's comment, link the specific commit in the comment thread. This will allow the reviewer to view the change without having to hunt the codebase for it.
  • Breaking Changes Any breaking changes made to the public interface must be specifically called out in the PR. The preferred format is a bullet list enumerating the specific nature of the change/s, as well as the types and method signatures that are affected. If necessary, migration guidance should also be provided here.

Thanks again for taking the time to review this Pull Request.

catch (const std::exception& e)
{
INVOKE_IF_NOT_NULL(NetworkInterruptionCallback, e.what());
LogSystem.LogMsg(csp::common::LogLevel::Log, "Connection Interrupted.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log should be in here, as this is when we call the NetworkInterruptionCallback

{{# parameters }}
{{# type.is_string }}
_{{ name }}
(const char*)_{{ name }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting compile errors for strings, as were were creating chars. but we are expected cont char

{{ name }}_StateObject,
{{# parameters }}
CallCallback({{ name }}, {{ name }}_StateObject,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now calls the wrapper function instead

errors.some(e => e.message.includes('table index is out of bounds')),
'Expected cross-thread `table index is out of bounds` error. Message not found, did you fix the bug (OB-3782)? Nice job! ',
);
!errors.some(e => e.message.includes('table index is out of bounds')));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can flip this test logic now its fixed!

@github-actions
Copy link

🚀 Pull Request Review Guidelines

Thank you for taking the time to review this Pull Request. The following is a summary of our Pull Request guidelines. The full guidelines can be found here.

💬 How to Provide Feedback

We use a comment ladder when leaving review comments to avoid any ambiguity.

Tag Is response required? Is a change required? May the PR author resolve?
[Fix] ✔️ ✔️
[Consider] ✔️ ✔️
[Question] ✔️
[Nit] ✔️
[Comment] ✔️

All comments should be prefixed with one of the above tags, for example:
[fix] Your editor is still set to use tabs instead of spaces, or this file is old and needs to be reformatted.

⚠️ Reviewers should be sure to resolve conversations if they feel their feedback has been addressed sufficiently.

🎯 PR Author Focus Areas

  • Link Commits to Conversations After making a change in response to a reviewer's comment, link the specific commit in the comment thread. This will allow the reviewer to view the change without having to hunt the codebase for it.
  • Breaking Changes Any breaking changes made to the public interface must be specifically called out in the PR. The preferred format is a bullet list enumerating the specific nature of the change/s, as well as the types and method signatures that are affected. If necessary, migration guidance should also be provided here.

Thanks again for taking the time to review this Pull Request.

@MAG-ElliotMorris
Copy link
Contributor

MAG-ElliotMorris commented Sep 26, 2025

I don't think this PR is complete, there is not enough testing. We need at least one test for each of the known cases. And given that this is an emscripten implementation, I was also expecting tests demonstrating what happens when functions are called from web workers, or at least some broadcasting of what the model is on this.

I was unfortunatly unable to reproduce the NetworkInterruptionCallback crash, both via testing and using a web build. Our testing framework crashes when trying to block to wait for the callback, so it isn't easily testable. Also, using a web build showed the callback logs, but no crash.

I'm don't think this is acceptable, the way the testing framework or CSP is setup isn't immutable, did you experiment with a proxying server & changing the interruption timeout time to be much lower/instant?

The below is ideological critique, and wouldn't neccesarily block a merge, that would be up to @MAG-AdamThorn as tech lead

Primarily though, looking at this implementation I struggle to agree with the tradeoffs of using the emscripten api for this, rather than a hand-rolled queue. It seems almost equally large in terms of code footprint, with the disadvantages of :

  • Using unfamiliar and arcane emscripten apis
  • Existing within mustache files which are very hard to read and write
  • Existing within the wrapper generator itself, which isn't an area we really have a handle on, it's going to make migrations and refactors harder as you've gotta keep this in mind.

Clients needing to call Tick to trigger certain events seems like a better tradeoff than the imposition of this sort of wrapper generator infrastructure.

@MAG-mv
Copy link
Contributor Author

MAG-mv commented Sep 26, 2025

@MAG-ElliotMorris

Using unfamiliar and arcane emscripten apis

It's pretty well documented, with 1 emscripten api call, and a few small functions

Existing within mustache files, which are very hard to read and write

Agreed. However, if we move to embind, it would be a better fit for this, and would be easy to move

Clients needing to call Tick to trigger certain events seems like a better tradeoff than the imposition of this sort of wrapper generator infrastructure.

The tick solution is limited, as it will push callbacks onto the main thread, ,so if clients want to call these functions on a worker in the future, it will still crash

With the emscripten solution, we can update this to call back to the thread the client originally called it on.

Also, I think the tick solution creates a confusing api, where certain functions will only callback if you are calling tick, especially since this affects functions during the csp initialization process (Login). This will break all of our tests and likely client implementations.

@MAG-ElliotMorris
Copy link
Contributor

MAG-ElliotMorris commented Sep 26, 2025

The tick solution is limited, as it will push callbacks onto the main thread, , even if clients call these functions on a worker, which will also cause the crash.

Why is that true? The example I showed you only pushed data back onto the main thread via a copy, callbacks never moved. It was a simpler model imo which completely sidestepped the problem of moving capture state and should have worked from any calling environment.

This will break all of our tests and likely client implementations.

I think this is a vast overstatement.

Nonetheless, as we disagree on this there's not much to be done. I'll reluctantly review the existing implementation presuming @MAG-AdamThorn wants to press ahead with this, although my critiques around the level of test coverage still stands.

I wish there was a way of moving this out of the mustache files still ... it's bound to just be forgotten about.

@MAG-mv
Copy link
Contributor Author

MAG-mv commented Sep 26, 2025

Why is that true? The example I showed you only pushed data back onto the main thread via a copy, callbacks never moved. It was a simpler model imo, which completely sidestepped the problem of moving capture state and should have worked from any calling environment.

The issue we are experiencing is that the callbacks are called on a different thread from which it was created, not that its called on a non-main thread. So, if clients call these functions on a worker thread, we will push to the main thread, which will crash again.

This current solution doesn't support this. However, it can be updated to support if needed.

@MAG-mv
Copy link
Contributor Author

MAG-mv commented Sep 26, 2025

I agree with the test coverage. As mentioned in the standup thread earlier, I will continue with this

@MAG-ElliotMorris
Copy link
Contributor

The issue we are experiencing is that the callbacks are called on a different thread from which it was created, not that its called on a non-main thread. So, if clients call these functions on a worker thread, we will push to the main thread, which will crash again.

Ah, I see. Yeah, the tick solution suffers from the same issue as this implementation then without additional work, I'll admit this weakens my argument somewhat.

@MAG-ElliotMorris
Copy link
Contributor

We shouldn't be having this discussion at PR time. Next time we realize we disagree on implementation, lets try to properly resolve it earlier.

Copy link
Contributor

@MAG-ElliotMorris MAG-ElliotMorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite my hesitation around solving the problem in this area at the moment, given the roughness of the terrain, I think the implementation is rather good, albeit underdocumented for the level of complexity on display.

static pthread_t MainThread = 0;

// Creates an emscripten queue and sets the main thread if we haven't already initialized.
void Emscripten_InitializeProxyQueue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] I don't understand why this logic is in a mustache file, and not in a proper .cpp with #EMSCRIPTEN guards like all of the rest of our emscripten specific code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. It would be a lot nicer in a cpp

Comment on lines 36 to 39
if (ProxyQueue != nullptr)
{
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Does this imply this Initialize method is called lazily? Is this a necessity, [consider] if it could be called during the standard initialize flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I was worried about doing it when we first created the variable because its global. Maybe it's ok to do so.

Comment on lines 45 to 51
// Structure to hold the callback and arguments.
template <typename ...T>
struct CallbackData
{
void (*Callback)(T...);
std::tuple<T...> Args;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] How does captured state play into this, seeing that this is a function pointer? Normally function pointer based callback approaches have a *UserContext sort of thing to work as a shim for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consider] Ah I see, the StateObject is always the first arg of the pack. Perhaps make that explicit and having the parameter pack only apply to user arguments in order to reduce arcanity, it's not possible to know how this works currently without also exploring the C-wrapper.

Copy link
Contributor

@MAG-ElliotMorris MAG-ElliotMorris Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment] This makes me worry about the "This will be trivial in embind" thing, we'll have to re-solve this. I suppose it's not the hugest thing.

static void Emscripten_CallbackWrapper(void* InData)
{
auto* Data = static_cast<CallbackData<T...>*>(InData);
std::apply(Data->Callback, Data->Args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Comment] Pretty neat, I almost never see std::apply used. I think its only real purpose is to do this sort of parameter unpack.

Copy link
Contributor

@MAG-ElliotMorris MAG-ElliotMorris Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] You might want to consider moving/forwarding the args, just for a bit of optimality for callbacks with value signatures. I struggle to trace the InData ownership however, so this might not be the thing to do.

// Generic callback function which will either call the callback directly, or,
// if we're on wasm, push to the main thread if we're not on the main thread.
template <typename ...T>
static void CallCallback(void (*Callback)(T...), T... Args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consider] Naming this more specifically so in the C-api it's known why it exists. CallCallbackOnMainThread or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants