Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(core): WASM tests hanging on ba-keyman-macos-2 #11794

Open
mcdurdin opened this issue Jun 16, 2024 · 3 comments
Open

bug(core): WASM tests hanging on ba-keyman-macos-2 #11794

mcdurdin opened this issue Jun 16, 2024 · 3 comments
Labels
bug core/ Keyman Core
Milestone

Comments

@mcdurdin
Copy link
Member

Many Core WASM tests hanging on ba-keyman-macos-2, but not always. Same tests always running to completion successfully on ba-keyman-macos-1. Looks like we have different versions of node/emscripten on the two mac agents:

ba-keyman-macos-1: node 21.7.3, emscripten 3.1.57-git
ba-keyman-macos-2: node 22.0.0, emscripten 3.1.59-git

Downloaded the built results for ba-keyman-macos-2 and tested locally, could not get it to fail.

Example build fails:

Running locally, meson test -j 1 seemed to always pass tests, every attempt I tried. meson test -j 2 or higher usually resulted in hanging tests.

Ran brew upgrade node (upgrades everything!) to 22.3.0, emscripten went to 3.1.61. This still sporadically fails in the same way.

@mcdurdin
Copy link
Member Author

nodejs/node#47748 seems related.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000198b419ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000198b7f55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000104f1e1d8 libuv.1.dylib`uv_cond_wait + 40
    frame #3: 0x0000000100cf914c node`node::TaskQueue<v8::Task>::BlockingDrain() + 48
    frame #4: 0x0000000100cf7d6c node`node::NodePlatform::DrainTasks(v8::Isolate*) + 44
    frame #5: 0x0000000100b999ac node`node::SpinEventLoopInternal(node::Environment*) + 284
    frame #6: 0x0000000100ccfdf0 node`node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) + 308
    frame #7: 0x0000000100ccfad0 node`node::NodeMainInstance::Run() + 124
    frame #8: 0x0000000100c435c4 node`node::Start(int, char**) + 476
    frame #9: 0x00000001987f60e0 dyld`start + 2360

@mcdurdin
Copy link
Member Author

For now, -j 1 to serialize tests sidesteps the issue.

@mcdurdin
Copy link
Member Author

Remove -j 1 and run meson test -t 0 on ba-=keyman-macos-2 in order to retrace our steps and repro the hang (most runs). I attached lldb to the hanging node process to grab the trace as shown above. Could probably escalate this to nodejs, but we don't currently have a minimal repro; seems like a race when 2+ node processes run at once, but would potentially need to do significant work on macos to bisect and repro on other devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug core/ Keyman Core
Development

No branches or pull requests

1 participant