Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface Configuration Failure is Fatal #3586

Open
cwfitzgerald opened this issue Mar 13, 2023 · 5 comments · May be fixed by #6253
Open

Surface Configuration Failure is Fatal #3586

cwfitzgerald opened this issue Mar 13, 2023 · 5 comments · May be fixed by #6253
Labels
area: validation Issues related to validation, diagnostics, and error handling area: wsi Issues with swapchain management or windowing help required We need community help to make this happen. type: enhancement New feature or request

Comments

@cwfitzgerald
Copy link
Member

Description

https://github.com//gfx-rs/wgpu/blob/13d2c3673c5e32c32baa4b610c01885afdb7601a/wgpu/src/backend/direct.rs#L734 means that all surface configure failures are fatal. We need to work out the steps to pass the error up through the error system.

There may be some more widespread consequences of having incomplete surfaces that need to be checked out.

@cwfitzgerald cwfitzgerald added type: enhancement New feature or request help required We need community help to make this happen. area: validation Issues related to validation, diagnostics, and error handling area: wsi Issues with swapchain management or windowing labels Mar 13, 2023
@Mindavi
Copy link

Mindavi commented Jul 11, 2023

I am running into this, and no clue what exactly is up. Maybe this is a fatal issue though.

Crash log ruffle
rick@nixos-asus:~/nixpkgs$ RUST_BACKTRACE=full ./result/bin/ruffle_desktop ~/Downloads/bloonstd4.swf 
warning: queue 0x56205cb019e0 destroyed while proxies still attached:
  wl_registry@34 still attached
thread 'main' panicked at 'Error in Surface::configure: Validation Error

Caused by:
    Parent device is lost
', /build/cargo-vendor-dir/wgpu-0.16.1/src/backend/direct.rs:734:18
stack backtrace:
   0:     0x56205a58f35a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::had616a77c932ab1b
   1:     0x56205a5bf00f - core::fmt::write::h3a35c0754863b3d5
   2:     0x56205a59cce5 - std::io::Write::write_fmt::h502c0f36b9ea0da9
   3:     0x56205a58f115 - std::sys_common::backtrace::print::h90cd8adaf174c598
   4:     0x56205a5952fe - std::panicking::default_hook::{{closure}}::ha88f96a3c80996f9
   5:     0x56205a594fb9 - std::panicking::default_hook::haf5a79a7bf67f396
   6:     0x562059767c71 - ruffle_desktop::init::{{closure}}::h9ff18405c3d618d4
   7:     0x56205a5959d7 - std::panicking::rust_panic_with_hook::h69ad028564cd3ed4
   8:     0x56205a58f689 - std::panicking::begin_panic_handler::{{closure}}::h8ea911da5e199679
   9:     0x56205a58f466 - std::sys_common::backtrace::__rust_end_short_backtrace::h02258dadc83c0b77
  10:     0x56205a595552 - rust_begin_unwind
  11:     0x5620596a4043 - core::panicking::panic_fmt::h55a49ce1199dc412
  12:     0x56205a2732ba - wgpu::backend::direct::Context::handle_error_fatal::h0e7a425758e37e3b
Aborted (core dumped)

Edit: ah, this is related to #2519, #3747 and NVIDIA/egl-wayland#72

@Wumpf
Copy link
Member

Wumpf commented Apr 4, 2024

As mentioned here #5382 (comment) I believe we should just go with non-fatal errors and deal with the fallout as it happens.

#5382 describes a way to trigger such a failure, but since it requires certain hardware setup I'd suggest to anyone tackling this to do a throw-away test that artificially makes surface creation fail internally and then see try to handle it with a different surface, ensuring that this works. If we can prove this for any single case to work it's an improvement over status quo.

Also, as pointed out by this ticket this affects get_current_texture in a similar way

@alokedesai
Copy link

@Wumpf Curious what you think is the best next step here.

As I see it there are two changes we can make, both of which are breaking changes:

  1. Update get_current_texture to not use a fatal-error. This will require we update the function to take in a Device so we can call handle_error_nolabel (which requires an ErrorSink from a Device)
  2. Update this function (and other functions that shouldn't be fatal by default) to return a Result that the caller can directly handle.

Curious if you have any preferences / thoughts, I'm a happy to make a PR but I'd like consensus on what you all think is the right approach as people who know this code much better than I do :)

To a certain extent, I think Results are nice here because it's encoded into the API and it's the Rust standard way to represent errors. However, I can see the value of the former since that's the standard wgpu way of representing errors.

@Wumpf
Copy link
Member

Wumpf commented Jul 15, 2024

Result would be nice, but as most other things we have to stick with the asynchronous error handling model that the WebGPU spec enforces (that's this signature https://www.w3.org/TR/webgpu/#dom-gpucanvascontext-getcurrenttexture in this case). If we wouldn't do this, then we'd end up with two different kinds of error handling when targeting WebGPU and everything else, and bridging this over is one of the goals of the project.

Which then leaves us with handle_error (ideally the labeled version). The error sink for this can simply live on the Surface struct like it is already the case with most other structs in wgpu_core.rs.

@alokedesai
Copy link

alokedesai commented Sep 5, 2024

@Wumpf I've started poking at this a little more.

That's a fair point about ensuring we match the getcurrenttexture signature exposed by the wgpu spec.

Given that, do you have a recommendation on what we return in the case there was an error? The 4 things that are returned by surface_get_current_texture are

  1. Option<Self::TextureId>
  2. SurfaceStatus
  3. Self::SurfaceOutputDetail

The first has a trivial default value when there's an error.

Any suggestions on what to return for SurfaceStatus and SurfaceOutputDetail here?

For webgpu the definition of SurfaceOutputDetail is just the unit type, which makes it easy.

For other backends, the definition lives here:

surface_id: wgc::id::SurfaceId,

Thoughts on just returning the ID that was passed to the function in the case there was an error?

SurfaceOutputDetail {
surface_id: surface_data.id,

Assuming you're on board with that, I can send you a PR later today that makes this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: validation Issues related to validation, diagnostics, and error handling area: wsi Issues with swapchain management or windowing help required We need community help to make this happen. type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants