Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

Open
shaselle opened this issue Jun 16, 2024 · 4 comments · May be fixed by nanovms/nanos#2032
Open

Issues using ops with Dart : pthread error: 22 (Invalid argument) #1629

shaselle opened this issue Jun 16, 2024 · 4 comments · May be fixed by nanovms/nanos#2032
Assignees

Comments

@shaselle
Copy link

Hello!
I am currently trying to create a dart server image.
Both ops run and ops pkg are failing with a message that I haven't seen previously.

../../runtime/vm/os_thread_linux.cc: 203: error: pthread error: 22 (Invalid argument)

*** signal 6 received by tid 2, errno 0, code -6

*** Thread context:
lastvector: 00000000000000ea
     frame: ffffc00002a01000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
...

Context

I have run a standalone dart server before as a proof of concept; but its been a while. And as seen in this now resolved dart related issue: Dart has worked well with nano, and ops before.

I seem to be running into the same error message on Ubuntu 24.04 and Debian 12.
It does not matter what my app is all about. I have trimmed it down to a helloworld with zero dependencies except the Dart language.

This is happening on all current dart channels, stable, beta, and dev of Dartlang v3.4.4.

Step by step

I have tried this simple dart program: test.dart

void main(final List<String> args){
    print("Hello from NanoVMs.");
}

Then compile test.dart.

dart compile exe test.dart  -o testx

Tried running with a both simple

ops run testx
ops -fn run  testx

I get the following error

../../runtime/vm/os_thread_linux.cc: 203: error: pthread error: 22 (Invalid argument)

*** signal 6 received by tid 2, errno 0, code -6

*** Thread context:
lastvector: 00000000000000ea
     frame: ffffc00002a01000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
...

Am I missing something?

My system

ops profile

Ops version: 0.1.41
Nanos version: 0.1.50
Qemu version: 8.2.2
OS: linux
Arch: amd64
Virtualized: false

Not that it matters but, the same dart program runs okay on Docker, but I get the same error even with ops pkg fromDocker.
The only dart language related error I can see is an old issue that has been fixed and closed, and predates my last working test.
runtime/vm/os_thread_linux.cc:234: error: pthread error: 22 (Invalid argument) #24169

@francescolavra francescolavra self-assigned this Jun 20, 2024
@francescolavra
Copy link
Member

@shaselle thanks for reporting the issue. Apparently there is a problem with on-demand paging of the program file, and we are working on a fix. In the meantime, you can get going by disabling on-demand paging, i.e. using a configuration file (named e.g. config.json) with the following contents:

{
  "ManifestPassthrough": {
    "static_map_program": "t"
  }
}

and then passing that file to the ops run command by adding -c config.json to your command line.

@shaselle
Copy link
Author

shaselle commented Jun 20, 2024

@francescolavra thanks for your helpful reply and work towards fixing this issue. Disabling on-demand paging workaround is working as expected.
Let me know when you need me to do more testing on my end as soon as the fix is available.

Much appreciated.

francescolavra added a commit to nanovms/nanos that referenced this issue Jun 23, 2024
When on-demand paging of the program file is enabled, BSS areas in
pages faulted-in on demand are zeroed in-place, i.e. a newly mapped
page retrieved via the page cache is zeroed starting from the BSS
offset set up when initializing the relevant vmap. This creates a
problem if the page contains other data (e.g. from another loadable
section of the program) at or after the BSS offset, in which case
this data would be overwritten.
This change fixes the above issue by using a separate page (instead
of the page from the page cache) where the initialized program data
(located before the BSS offset) is copied from the page cache page,
and the rest of the page (starting at the BSS offset) is zeroed
out. Closes nanovms/ops#1629.

The on-demand paging implementation is being reworked to address
the following shortcomings:
- A vmap struct cannot be referenced without holding the vmap lock,
because it may be modified and/or deallocated at any time (e.g.
if the access protection flags of contiguous memory areas are
modified)
- Parallel handling of page faults from different CPUS for the
same page cannot be safely handled via a process-global pending
fault list, because it is possible for a faulting CPU to create a
new pending fault for a given page and then complete it before
another faulting CPU processes a fault for the same page: in this
case, the second CPU would not find any pending fault for the page,
even though the fault has been handled by the first CPU
The reworked implementation allows multiple faults to be pending
simultaneously for the same page, and relies on the page table lock
to prevent multiple mappings of the same page by different CPUs.
francescolavra added a commit to nanovms/nanos that referenced this issue Jun 23, 2024
When on-demand paging of the program file is enabled, BSS areas in
pages faulted-in on demand are zeroed in-place, i.e. a newly mapped
page retrieved via the page cache is zeroed starting from the BSS
offset set up when initializing the relevant vmap. This creates a
problem if the page contains other data (e.g. from another loadable
section of the program) at or after the BSS offset, in which case
this data would be overwritten.
This change fixes the above issue by using a separate page (instead
of the page from the page cache) where the initialized program data
(located before the BSS offset) is copied from the page cache page,
and the rest of the page (starting at the BSS offset) is zeroed
out. Closes nanovms/ops#1629.

The on-demand paging implementation is being reworked to address
the following shortcomings:
- A vmap struct cannot be referenced without holding the vmap lock,
because it may be modified and/or deallocated at any time (e.g.
if the access protection flags of contiguous memory areas are
modified)
- Parallel handling of page faults from different CPUS for the
same page cannot be safely handled via a process-global pending
fault list, because it is possible for a faulting CPU to create a
new pending fault for a given page and then complete it before
another faulting CPU processes a fault for the same page: in this
case, the second CPU would not find any pending fault for the page,
even though the fault has been handled by the first CPU
The reworked implementation allows multiple faults to be pending
simultaneously for the same page, and relies on the page table lock
to prevent multiple mappings of the same page by different CPUs.
@francescolavra
Copy link
Member

The on-demand paging issue in solved in nanovms/nanos#2032.
If you want to try out the kernel with this fix, you can do it by adding --nanos-version 747cea9 to your ops run command line.

@shaselle
Copy link
Author

I can confirm,

running Dart standalone executable with --nanos-version 747cea9, works as expected without disabling on-demand paging.

francescolavra added a commit to nanovms/nanos that referenced this issue Jun 26, 2024
When on-demand paging of the program file is enabled, BSS areas in
pages faulted-in on demand are zeroed in-place, i.e. a newly mapped
page retrieved via the page cache is zeroed starting from the BSS
offset set up when initializing the relevant vmap. This creates a
problem if the page contains other data (e.g. from another loadable
section of the program) at or after the BSS offset, in which case
this data would be overwritten.
This change fixes the above issue by using a separate page (instead
of the page from the page cache) where the initialized program data
(located before the BSS offset) is copied from the page cache page,
and the rest of the page (starting at the BSS offset) is zeroed
out. Closes nanovms/ops#1629.

The on-demand paging implementation is being reworked to address
the following shortcomings:
- A vmap struct cannot be referenced without holding the vmap lock,
because it may be modified and/or deallocated at any time (e.g.
if the access protection flags of contiguous memory areas are
modified)
- Parallel handling of page faults from different CPUS for the
same page cannot be safely handled via a process-global pending
fault list, because it is possible for a faulting CPU to create a
new pending fault for a given page and then complete it before
another faulting CPU processes a fault for the same page: in this
case, the second CPU would not find any pending fault for the page,
even though the fault has been handled by the first CPU
The reworked implementation allows multiple faults to be pending
simultaneously for the same page, and relies on the page table lock
to prevent multiple mappings of the same page by different CPUs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants