Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

shaselle · 2024-06-16T05:34:26Z

Hello!
I am currently trying to create a dart server image.
Both ops run and ops pkg are failing with a message that I haven't seen previously.

../../runtime/vm/os_thread_linux.cc: 203: error: pthread error: 22 (Invalid argument)

*** signal 6 received by tid 2, errno 0, code -6

*** Thread context:
lastvector: 00000000000000ea
     frame: ffffc00002a01000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
...

Context

I have run a standalone dart server before as a proof of concept; but its been a while. And as seen in this now resolved dart related issue: Dart has worked well with nano, and ops before.

I seem to be running into the same error message on Ubuntu 24.04 and Debian 12.
It does not matter what my app is all about. I have trimmed it down to a helloworld with zero dependencies except the Dart language.

This is happening on all current dart channels, stable, beta, and dev of Dartlang v3.4.4.

Step by step

I have tried this simple dart program: test.dart

void main(final List<String> args){
    print("Hello from NanoVMs.");
}

Then compile test.dart.

dart compile exe test.dart  -o testx

Tried running with a both simple

ops run testx

ops -fn run  testx

I get the following error

../../runtime/vm/os_thread_linux.cc: 203: error: pthread error: 22 (Invalid argument)

*** signal 6 received by tid 2, errno 0, code -6

*** Thread context:
lastvector: 00000000000000ea
     frame: ffffc00002a01000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
...

Am I missing something?

My system

ops profile

Ops version: 0.1.41
Nanos version: 0.1.50
Qemu version: 8.2.2
OS: linux
Arch: amd64
Virtualized: false

Not that it matters but, the same dart program runs okay on Docker, but I get the same error even with ops pkg fromDocker.
The only dart language related error I can see is an old issue that has been fixed and closed, and predates my last working test.
runtime/vm/os_thread_linux.cc:234: error: pthread error: 22 (Invalid argument) #24169

The text was updated successfully, but these errors were encountered:

francescolavra · 2024-06-20T10:54:53Z

@shaselle thanks for reporting the issue. Apparently there is a problem with on-demand paging of the program file, and we are working on a fix. In the meantime, you can get going by disabling on-demand paging, i.e. using a configuration file (named e.g. config.json) with the following contents:

{
  "ManifestPassthrough": {
    "static_map_program": "t"
  }
}

and then passing that file to the ops run command by adding -c config.json to your command line.

shaselle · 2024-06-20T12:29:21Z

@francescolavra thanks for your helpful reply and work towards fixing this issue. Disabling on-demand paging workaround is working as expected.
Let me know when you need me to do more testing on my end as soon as the fix is available.

Much appreciated.

When on-demand paging of the program file is enabled, BSS areas in pages faulted-in on demand are zeroed in-place, i.e. a newly mapped page retrieved via the page cache is zeroed starting from the BSS offset set up when initializing the relevant vmap. This creates a problem if the page contains other data (e.g. from another loadable section of the program) at or after the BSS offset, in which case this data would be overwritten. This change fixes the above issue by using a separate page (instead of the page from the page cache) where the initialized program data (located before the BSS offset) is copied from the page cache page, and the rest of the page (starting at the BSS offset) is zeroed out. Closes nanovms/ops#1629. The on-demand paging implementation is being reworked to address the following shortcomings: - A vmap struct cannot be referenced without holding the vmap lock, because it may be modified and/or deallocated at any time (e.g. if the access protection flags of contiguous memory areas are modified) - Parallel handling of page faults from different CPUS for the same page cannot be safely handled via a process-global pending fault list, because it is possible for a faulting CPU to create a new pending fault for a given page and then complete it before another faulting CPU processes a fault for the same page: in this case, the second CPU would not find any pending fault for the page, even though the fault has been handled by the first CPU The reworked implementation allows multiple faults to be pending simultaneously for the same page, and relies on the page table lock to prevent multiple mappings of the same page by different CPUs.

francescolavra · 2024-06-23T19:52:38Z

The on-demand paging issue in solved in nanovms/nanos#2032.
If you want to try out the kernel with this fix, you can do it by adding --nanos-version 747cea9 to your ops run command line.

shaselle · 2024-06-23T23:04:38Z

I can confirm,

running Dart standalone executable with --nanos-version 747cea9, works as expected without disabling on-demand paging.

When on-demand paging of the program file is enabled, BSS areas in pages faulted-in on demand are zeroed in-place, i.e. a newly mapped page retrieved via the page cache is zeroed starting from the BSS offset set up when initializing the relevant vmap. This creates a problem if the page contains other data (e.g. from another loadable section of the program) at or after the BSS offset, in which case this data would be overwritten. This change fixes the above issue by using a separate page (instead of the page from the page cache) where the initialized program data (located before the BSS offset) is copied from the page cache page, and the rest of the page (starting at the BSS offset) is zeroed out. Closes nanovms/ops#1629. The on-demand paging implementation is being reworked to address the following shortcomings: - A vmap struct cannot be referenced without holding the vmap lock, because it may be modified and/or deallocated at any time (e.g. if the access protection flags of contiguous memory areas are modified) - Parallel handling of page faults from different CPUS for the same page cannot be safely handled via a process-global pending fault list, because it is possible for a faulting CPU to create a new pending fault for a given page and then complete it before another faulting CPU processes a fault for the same page: in this case, the second CPU would not find any pending fault for the page, even though the fault has been handled by the first CPU The reworked implementation allows multiple faults to be pending simultaneously for the same page, and relies on the page table lock to prevent multiple mappings of the same page by different CPUs.

francescolavra self-assigned this Jun 20, 2024

eyberg mentioned this issue Jun 20, 2024

doc option to disable demand paging nanovms/ops-documentation#484

Open

francescolavra linked a pull request Jun 23, 2024 that will close this issue

On-demand program file paging: fix initialization of BSS areas nanovms/nanos#2032

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

shaselle commented Jun 16, 2024

francescolavra commented Jun 20, 2024

shaselle commented Jun 20, 2024 •

edited

Loading

francescolavra commented Jun 23, 2024

shaselle commented Jun 23, 2024

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

Comments

shaselle commented Jun 16, 2024

Context

Step by step

My system

francescolavra commented Jun 20, 2024

shaselle commented Jun 20, 2024 • edited Loading

francescolavra commented Jun 23, 2024

shaselle commented Jun 23, 2024

shaselle commented Jun 20, 2024 •

edited

Loading