-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Surface#configure and Surface#get_current_texture non-fatal #6253
base: trunk
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -67,6 +67,8 @@ pub enum SurfaceError { | |||||||||
Lost, | ||||||||||
/// There is no more memory left to allocate a new frame. | ||||||||||
OutOfMemory, | ||||||||||
/// Acquiring a texture failed for an unknown reason | ||||||||||
Other, | ||||||||||
Comment on lines
+70
to
+71
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar'ish suggestion
Suggested change
string below needs adjusting as well. |
||||||||||
} | ||||||||||
static_assertions::assert_impl_all!(SurfaceError: Send, Sync); | ||||||||||
|
||||||||||
|
@@ -77,6 +79,7 @@ impl fmt::Display for SurfaceError { | |||||||||
Self::Outdated => "The underlying surface has changed, and therefore the swap chain must be updated", | ||||||||||
Self::Lost => "The swap chain has been lost and needs to be recreated", | ||||||||||
Self::OutOfMemory => "There is no more memory left to allocate a new frame", | ||||||||||
Self::Other => "Acquiring a texture failed for an unknown reason" | ||||||||||
}) | ||||||||||
} | ||||||||||
} | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -441,6 +441,9 @@ pub struct Surface { | |
/// Configured device is needed to know which backend | ||
/// code to execute when acquiring a new frame. | ||
configured_device: Mutex<Option<wgc::id::DeviceId>>, | ||
/// The error sink with which to report errors. | ||
/// `None` if the surface has not been configured. | ||
error_sink: Mutex<Option<ErrorSink>>, | ||
} | ||
|
||
#[derive(Debug)] | ||
|
@@ -572,6 +575,7 @@ impl crate::Context for ContextWgpuCore { | |
Ok(Surface { | ||
id, | ||
configured_device: Mutex::default(), | ||
error_sink: Mutex::default(), | ||
}) | ||
} | ||
|
||
|
@@ -707,9 +711,10 @@ impl crate::Context for ContextWgpuCore { | |
.0 | ||
.surface_configure(surface_data.id, device_data.id, config); | ||
if let Some(e) = error { | ||
self.handle_error_fatal(e, "Surface::configure"); | ||
self.handle_error_nolabel(&device_data.error_sink, e, "Surface::configure"); | ||
} else { | ||
*surface_data.configured_device.lock() = Some(device_data.id); | ||
*surface_data.error_sink.lock() = Some(device_data.error_sink.clone()); | ||
} | ||
} | ||
|
||
|
@@ -736,7 +741,19 @@ impl crate::Context for ContextWgpuCore { | |
}, | ||
) | ||
} | ||
Err(err) => self.handle_error_fatal(err, "Surface::get_current_texture_view"), | ||
Err(err) => match surface_data.error_sink.lock().as_ref() { | ||
Some(error_sink) => { | ||
self.handle_error_nolabel(error_sink, err, "Surface::get_current_texture_view"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wish we could forward this error as-is. We might add more status codes for this in the future or enhance the state that is What irks me in particular about this is that according to your reports you are seeing spurious errors here with many of your users which implies that going to the error sink is the wrong thing to do: a reported error should imply application-implementor error rather than driver/end-user error. Meaning it's something that one would be expected to dynamically handle as an application-implementor. Note also that ... or are the errors you're seeing with your users exclusively about To clarify, we don't have to solve this here all the way. This is more of a sentiment / open question. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The crash we're seeing right now is that we occasionally seem to get "Parent Device is Lost" validation errors when calling I've attached an example stacktrace of that below: The most ergonomic solution to this (IMO, obviously you know this code better than I do) would be to forward the error and return a result in both functions so the caller can handle them. As you mentioned in the issue when we were discussing earlier, we need to conform with the webgpu spec around handling errors asynchronously so I'm not sure that's totally feasible. What I'm find awkward about the current approach (and I think this is partly to your point) is that the What do you think is the next step here? I think what is in this PR is ok (and acceptable) but would love to brainstorm how to make it more ergonomic. We can't return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for sharing those callstacks. This kind of tracking is incredibly valuable. We do some of that at Rerun as well but our userbase is a lot smaller at this point and a lot of the traffic is blocked (and it's easy to opt out), we don't have that good of an insight.
100% agree the more we can forward directly and still be WebGPU compatible the better!
get_current_texture in particular has practically no failure case other than some resources being null which afaik isn't possible in our Rust interface.
Yes agree again. With the above in mind about what failures are possible in wgpu-core vs webgpu this makes a bit more sense.
Yep, definitely an improvement. Let's not get perfect in the way of good 😄
I think we should go over all surface relateed results & error codes again across the crate and unify them as much as possible. I haven't thought it through yet, but I think we could have status codes only show up as part of usable textures and everything else be an error with no status code at all. Not a lot of prior art to this unfortunately. There should probably more places in wgpu where we give back some limited Result - grepping for |
||
( | ||
None, | ||
SurfaceStatus::Unknown, | ||
SurfaceOutputDetail { | ||
surface_id: surface_data.id, | ||
}, | ||
) | ||
} | ||
None => self.handle_error_fatal(err, "Surface::get_current_texture_view"), | ||
}, | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if
Unknown
is really the right term here. We know thatget_current_texture
failed for a known reason when we get this. But then again we don't know what state our surface is in at this point (the error may have caused mayhem such that reconfigure is in order or the nextget_current_texture
call might just succeed).So maybe it's enough to just clarify this here a little bit:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of clarifying via the comment. The reason I chose
Unknown
as the name here is exactly what you mentioned: we don't really know the status of the surface when we get this error.For example, if we get a
DeviceLost
error we can't reliably report anything about the status of the Surface (since we got an error higher in the stack)