Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray Queries #6291

Open
wants to merge 350 commits into
base: trunk
Choose a base branch
from
Open

Ray Queries #6291

wants to merge 350 commits into from

Conversation

Vecvec
Copy link
Contributor

@Vecvec Vecvec commented Sep 18, 2024

Connections
Works towards #1040

Description
This PR provides BLASes (bottom level acceleration structures), TLASes (top level acceleration structures), TLAS instances (which contain BLASes and data, like transforms, about them), TLAS packages (which contain vectors of TLAS instances).

Testing
Running tests & examples included in PR

This a updated version of #3631 and is intended to replace it.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • n/a --target wasm32-unknown-unknown
    • n/a --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.
  • More tests & examples.
  • More docs

Later (follow-up PR)

  • Acceleration structure compaction.
  • Procedural geometry.
  • as_hal methods

expenses and others added 30 commits July 24, 2023 13:30
# Conflicts:
#	Cargo.lock
#	tests/Cargo.toml
#	wgpu-core/src/binding_model.rs
#	wgpu-core/src/command/compute.rs
#	wgpu-core/src/command/mod.rs
#	wgpu-core/src/command/render.rs
#	wgpu-core/src/device/life.rs
#	wgpu-core/src/device/queue.rs
#	wgpu-core/src/device/resource.rs
#	wgpu-core/src/hub.rs
#	wgpu-core/src/resource.rs
#	wgpu-core/src/track/mod.rs
#	wgpu-hal/Cargo.toml
#	wgpu-hal/src/dx11/mod.rs
#	wgpu-hal/src/lib.rs
#	wgpu-hal/src/vulkan/adapter.rs
#	wgpu-types/src/lib.rs
#	wgpu/Cargo.toml
#	wgpu/src/context.rs
…' into ray-tracing

# Conflicts:
#	Cargo.lock
#	wgpu-hal/Cargo.toml
#	wgpu-hal/src/dx11/mod.rs
#	wgpu-types/src/lib.rs
# Conflicts:
#	wgpu-core/src/binding_model.rs
#	wgpu-core/src/device/resource.rs
#	wgpu-hal/examples/ray-traced-triangle/main.rs
#	wgpu-hal/src/dx11/command.rs
#	wgpu-hal/src/dx11/device.rs
#	wgpu-hal/src/dx11/mod.rs
#	wgpu-hal/src/lib.rs
#	wgpu-types/src/lib.rs
@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 25, 2024

I think the second one is actually valid, ordering:

  1. build blas: built index 1
  2. build tlas: checks blas, it has been build w/ index 1
  3. build tlas: checks blas, it has been build w/ index 1
  4. build blas: this will invalidate the tlas, so any new usage will fail (ray query)

edit: If you create a new ASContext before step 3 this will work.
edit # 2: I'll fix this

@cwfitzgerald
Copy link
Member

mumble mumble mutable damn state

@cwfitzgerald
Copy link
Member

Okay nice, it does seem to work.

I am also getting some intermittent

VALIDATION [VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442 (0xaf03fd73)]
        Validation Error: [ VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442 ] | MessageID = 0xaf03fd73 | vkDestroyAccelerationStructureKHR():  can't be called on VkAccelerationStructureKHR 0xcad092000000000d[TLAS] that is currently in use by VkCommandBuffer 0x1e1b8536c60[TLAS 1]. The Vulkan spec states: All submitted commands that refer to accelerationStructure must have completed execution (https://vulkan.lunarg.com/doc/view/1.3.290.0/windows/1.3-extensions/vkspec.html#VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442)
[2024-09-25T04:21:55Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkDestroyBuffer-buffer-00922 (0xe4549c11)]

Which means the tlas drop isn't properly deferring until the command encoder is done with it.

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 25, 2024

I suspect the second error is that if we are breaking with an error it fails since it has drained the actions.

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 25, 2024

Looking into queue submit, it seems that earlier errors (

let res = validate_command_buffer(
&command_buffer,
&queue,
&cmd_buf_data,
&snatch_guard,
&mut submit_surface_textures_owned,
&mut used_surface_textures,
);
if let Err(err) = res {
first_error.get_or_insert(err);
cmd_buf_data.destroy(&command_buffer.device);
continue;
}
cmd_buf_data.into_baked_commands()
) destroy the command buffer and continue, while later ones (
if let Err(e) = baked.initialize_buffer_memory(&mut trackers, &snatch_guard)
{
break 'error Err(e.into());
}
if let Err(e) =
baked.initialize_texture_memory(&mut trackers, device, &snatch_guard)
{
break 'error Err(e.into());
}
) break out of the error context instead (this is because these should only return hal errors). I think that the validate functions are now out of place and should be in validate_command_buffer.

Edit: never mind, this didn't help the validation errors

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 25, 2024

also for the first validation error we place some acceleration structure barriers that go between build and shader input
(line 336, 719, 1238) these should only be put in if also ray_query (or later ray_tracing_pipeline) appear.

I think this needs something in hal that conditionally adds that based on feature support (or we expand shader input to vertex, fragment, compute, and later ray-tracing-pipeline inputs).

@cwfitzgerald
Copy link
Member

Chatted with the other maintainers, and two main things to do before we can land, then we're good to land with follow up testing for any naga stuff.

Both of these things I can take on:

  • More testing; some validation testing would be nice to make sure common error patterns are easily caught.
  • The validation should be refactored away from storing mutable state on the TLAS/BLAS itself and move to a tracker-like way of validating state. We've learned from experience with the texture initialization code that having state inside the resources itself is an area that is ripe for bugs and we want to move towards validation where information all "flows" from command encoder into the queue. This is a lot easier to get right and to keep right. Even if the validation as it stands today is correct, we're very liable to break it in the future, by accident. I'm not sure if you wrote the valdiation for this or someone else did, but I can do this conversion, just wanted you to know what my plan was.

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 25, 2024

I'm fine to work on some of the testing, I assume common errors are thing like:

  • Invalid vertex format
  • Mismatched index buffers
  • ect.

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 26, 2024

After using advanced debugging (putting an eprintln! at the end of the test) :) I've discovered that the second set of vulkan validation errors are happening at the end of the test. Shouldn't all encoders have been reset by the wgpu validation errors generated in queue submit?

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 26, 2024

It seems like command buffers might be leaking if a validation error is generated

[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] CommandEncoder::drop Id(1,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::command] Drop CommandBuffer with 'TLAS 1' label <--- command buffer freed here
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] Device::create_shader_module -> Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] Device::create_compute_pipeline -> Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] Device::create_bind_group -> Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] BindGroupLayout::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] Device::create_command_encoder -> Id(1,2)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::queue] Queue::submit Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] CommandBuffer::drop Id(1,2)
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] CommandEncoder::drop Id(1,2)
[2024-09-26T08:58:50Z INFO  wgpu_core::command] Drop CommandBuffer with '' label
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] BindGroup::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::binding_model] Destroy raw BindGroup with '' label
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] ComputePipeline::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::pipeline] Destroy raw ComputePipeline with '' label
[2024-09-26T08:58:50Z INFO  wgpu_core::binding_model] Destroy raw PipelineLayout with '' label
[2024-09-26T08:58:50Z INFO  wgpu_core::binding_model] Destroy raw BindGroupLayout with '' label
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] ShaderModule::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::pipeline] Destroy raw ShaderModule with 'shader.wgsl' label
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] CommandEncoder::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::command] Drop CommandBuffer with 'BLAS 1' label
[2024-09-26T08:58:50Z INFO  wgpu_core::device::global] Buffer::drop Id(0,1)
[2024-09-26T08:58:50Z INFO  wgpu_core::resource] Destroy raw Tlas with 'TLAS' label
[2024-09-26T08:58:50Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442 (0xaf03fd73)]
    	Validation Error: [ VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442 ] | MessageID = 0xaf03fd73 | vkDestroyAccelerationStructureKHR():  can't be called on VkAccelerationStructureKHR 0xcad092000000000d[TLAS] that is currently in use by VkCommandBuffer 0x7f8b0de60af0[TLAS 1]. The Vulkan spec states: All submitted commands that refer to accelerationStructure must have completed execution (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkDestroyAccelerationStructureKHR-accelerationStructure-02442) 
               ^--- vulkan validation error about command buffer being still active

I'll see if I can create a memory leak repro (without acceleration structures). Never mind, remembered command encoders get reused.

Edit: wondering if #6323 might be related

@Vecvec
Copy link
Contributor Author

Vecvec commented Sep 28, 2024

I've put a workaround in the tests that has stopped the vulkan validation errors as I believe that the errors from the vulkan validation layers are the same as those in #6323.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants