Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray Tracing Support #1040

Open
5nefarious opened this issue Nov 23, 2020 · 53 comments
Open

Ray Tracing Support #1040

5nefarious opened this issue Nov 23, 2020 · 53 comments
Labels
area: api Issues related to API surface help required We need community help to make this happen. type: enhancement New feature or request

Comments

@5nefarious
Copy link

5nefarious commented Nov 23, 2020

Khronos has recently released the final specification for ray tracing on Vulkan. At this point, DX12, Vulkan, and Metal all seem to have some form of acceleration for ray tracing. Are there any plans to eventually consolidate and expose these APIs in wgpu and wgpu-rs?

I foresee real-time rendering making increasing use of ray tracing in the future, so this may be an essential feature to have. However, I imagine it may be very difficult to support this on the DX11 and OpenGL backends through software emulation.

@kvark kvark added area: api Issues related to API surface help required We need community help to make this happen. type: enhancement New feature or request labels Nov 23, 2020
@kvark
Copy link
Member

kvark commented Nov 23, 2020

See also:

It's too early to consider this an essential feature on all backends, but rolling it out at least where the backends do have support for it seems very useful today.

@Sebbl0508
Copy link

Are there any updates on this?

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Jan 21, 2022

Not currently. There's interest from upstream WebGPU, but that would be well after v1.

We could make it a native extension and I think it would be great, but the apis are quite large (including shader side transformations), and given very few of us actually have hardware to support this, the chance of it happening any time soon without a champion is low.

@OriginLive
Copy link

Hi, when is this comming out?

@cwfitzgerald
Copy link
Member

There's no one actively working on it to my knowledge.

@OriginLive
Copy link

It would be nice to atleast get native extension or some bindings. Would be a nice start

@expenses
Copy link
Contributor

expenses commented Aug 29, 2022

I'm curious about getting basic ray-tracing support working. I think that a first draft wouldn't be too hard, with some limitations. I think we'd initially want to:

The remaining part is then creating an abstraction for acceleration structures (https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_acceleration_structure.html).

You probably want to focus on getting a simple acceleration structure abstraction for triangle mesh and instance ASes that are built on the GPU. Anything else can come later.

I can't promise anything, but I'll try and look into this when I'm back with my home setup.

@expenses
Copy link
Contributor

My progress so far is at master...expenses:wgpu:hal-acceleration-structures. I plan on merging in wgpu-hal support first, probably with a 'hello world' ray-traced triangle example like https://github.com/SaschaWillems/Vulkan#basic-ray-tracing.

@kvark
Copy link
Member

kvark commented Sep 12, 2022

@expenses thank you for championing Ray Tracing! It's very exciting for the community to get access to it 🚀 .

Our main concerns are the maintenance costs for a large API surface added to the fact it will need to change in order to abstract over DXR efficiently. Ideally, there would be a proper investigation on the API differences between VkRT and DXR before wgpu-hal API is prototyped. However, we are somewhat confident that the amount of changes needed for DXR will be limited, and it would be fine to do as the next step.

My advice, if I may, would be to not try to copy Vulkan into wgpu-hal. Our HAL is low level and zero/low overhead, but it doesn't have to be extremely low level. For example, it doesn't have the API for allocating memory and binding objects to it, like Vulkan. So if you have a choice of 1) put complexity in the Vulkan backend, or 2) expose it in wgpu-hal, please put a bigger weight on 1). We can make it more complex as a follow-up for DXR if needed.

Again, thank you for all the amazing contributions. Looking forward to see the opportunities that your work opens to all of us!

@expenses
Copy link
Contributor

My impression is that as Vulkan raytracing was based on DXR, the APIs should be fairly similar. Beyond acceleration structures, the main thing I've focused on is the ray query extension: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_query.html. This allows ray tracing in the normal vertex/fragment/compute shader stages as opposed to requiring a new ray tracing shader stage and shader binding tables and all that stuff.

This is equivalent to inline ray tracing in DXR 1.1: https://devblogs.microsoft.com/directx/dxr-1-1/#inline-raytracing

@trsh
Copy link

trsh commented Sep 28, 2022

My impression is that as Vulkan raytracing was based on DXR, the APIs should be fairly similar. Beyond acceleration structures, the main thing I've focused on is the ray query extension: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_query.html. This allows ray tracing in the normal vertex/fragment/compute shader stages as opposed to requiring a new ray tracing shader stage and shader binding tables and all that stuff.

This is equivalent to inline ray tracing in DXR 1.1: https://devblogs.microsoft.com/directx/dxr-1-1/#inline-raytracing

Looking at the example. The setup on rust part is insane. 1k lines for raytracing few triangles. And unsafe blocks?

@expenses
Copy link
Contributor

Looking at the example. The setup on rust part is insane. 1k lines for raytracing few triangles. And unsafe blocks?

@trsh it's a wgpu-hal example, not a main wgpu one :) The halmark example is just as long.

@coder3112
Copy link

Hello. I am a bit confused o.o.
Is it in the roadmap to support hardware-accelerated raytracing (that nvidia RTX cards or AMD's 6000-radeon series support)? Is it still a consideration? How far away is stable raytracing support? (I am guessing a year atleast based on the thread but I might be completely off)

@cwfitzgerald
Copy link
Member

Hello!

Is it in the roadmap to support hardware-accelerated raytracing (that nvidia RTX cards or AMD's 6000-radeon series support)?

Yes. This issue is for DXR, VK_KHR_ray_tracing support.

How far away is stable raytracing support?

There's no set timeframe, just people working on it when they are able. There is a lot of components that go into raytracing, so full wgpu will take a while. Some small first steps have been made (like implementing RT in wgpu-hal with no shader support)

@coder3112
Copy link

Thank you for the response!

@gents83
Copy link
Contributor

gents83 commented Jan 17, 2023

If you want to play a bit with compute shaders and raytracing to create visibility buffer to the apply your shading, I've played a bit with the idea in my raytracing_visibility shader 😄

https://github.com/gents83/INOX/blob/master/data_raw/shaders/wgsl/raytracing_visibility.wgsl

@kvark
Copy link
Member

kvark commented Feb 22, 2023

@expenses I'm adding ray query support to WGSL in gfx-rs/naga#2256
You could try this instead of forcing raw SPIR-V usage.

@daniel-keitel
Copy link
Contributor

I have continued to work @expenses first implementation. (now PR #3507)
Currently I am looking into the DirectX12 Specification, so the hal Api doesn't need to change when implementing DX at a later point.

The Specifications (Dx,Vk) are fairly compatible in regards to acceleration structures,
but I see no way to build acceleration structures indirectly with DX12.

Can anyone confirm if DX12 is unable to build acceleration structures indirectly?
(What I found: ExecuteIndirect)

@cwfitzgerald
Copy link
Member

pGeometryDescs is a CPU based parameter as opposed to InstanceDescs which live on the GPU is, at least for initial implementations, the CPU needs to look at some of the information such as triangle counts in pGeometryDescs in order to schedule acceleration structure builds. Perhaps in the future more of the data can live on the GPU

Definitely sounds like there is no indirect.

@daniel-keitel
Copy link
Contributor

@cwfitzgerald thanks for the assurance, (quite odd, that it is impossible).
I will ignore vkCmdBuildAccelerationStructuresIndirectKHR for now.

@daniel-keitel
Copy link
Contributor

The initial implementation #3507 (Vulkan, Ray Query) should be finished now, and is awaiting review.

I started to implement ray-tracing pipelines in wgpu-hal for Vulkan in #3607 in the meantime.
A simple example works already.

@daniel-keitel
Copy link
Contributor

My Proposal for the api in wgpu:

// Ray tracing api proposal for wgpu (underlining Vulkan, Metal and DX12 implementations)

// The general design goal is to come up with an simpler Api, which allows for validation.
// Since this validation would have high overhead in some cases, 
// I decided to provide more limited functions and unsafe functions for the same action, to evade this tradeoff.  

// Error handling and traits like Debug are omitted. 

// Core structures with no public members
pub struct Blas {}
pub struct Tlas {}
pub struct BlasRequirements {}
pub struct TlasInstances{}

// Size descriptors used to describe the size requirements of blas geometries.
// Also used internally for strict validation
pub struct BlasTriangleGeometrySizeDescriptor{
    pub vertex_format: wgt::VertexFormat,
    pub vertex_count: u32,
    pub index_format: Option<wgt::IndexFormat>,
    pub index_count: Option<u32>,
    pub flags: AccelerationStructureGeometryFlags,
}

pub struct BlasProceduralGeometrySizeDescriptor{
    pub count: u32,
    pub flags: AccelerationStructureGeometryFlags,
} 

// Procedural geometry contains AABBs
pub struct BlasProceduralGeometry{
    pub size: BlasTriangleGeometrySize,
    pub bounding_box_buffer: Buffer,
    pub bounding_box_buffer_offset: wgt::BufferAddress,
    pub bounding_box_stride: wgt::BufferAddress,
}

// Triangle Geometry contains vertices, optionally indices and transforms 
pub struct BlasTriangleGeometry{
    pub size: BlasTriangleGeometrySize,
    pub vertex_buffer: Buffer
    pub first_vertex: u32,
    pub vertex_stride: wgt::BufferAddress,
    pub index_buffer: Option<Buffer>,
    pub index_buffer_offset: Option<wgt::BufferAddress>,
    pub transform_buffer: Option<Buffer>,
    pub transform_buffer_offset: Option<wgt::BufferAddress>,
}

// Build flags 
pub struct AccelerationStructureFlags{
    // build_speed, small_size, ...
}

// Geometry flags
pub struct AccelerationStructureGeometryFlags{
    // opaque, no_duplicate_any_hit, ...
}

// Descriptors used to determine the memory requirements and validation of a acceleration structure 
pub enum BlasGeometrySizeDescriptors{
    Triangles{desc: Vec<BlasTriangleGeometrySizeDescriptor>},
    Procedural(desc: Vec<BlasProceduralGeometrySize>) 
}

// With prefer update, we decide if an update is possible, else we rebuild.
// Maybe a force update option could be useful
pub enum UpdateMode{
    Build,
    // Update,
    PreferUpdate,
}

// General descriptor for the size requirements, 
// since the required size depends on the contents and build flags 
pub struct GetBlasRequirementsDescriptor{
    pub flags: AccelerationStructureFlags,
}

// Creation descriptors, we provide flags, and update_mode.
// We store it in the structure, so we don't need to pass it every build.
pub struct CreateBlasDescriptor<'a>{
    pub requirements: &'a BlasRequirements
    pub flags: AccelerationStructureFlags,
    pub update_mode: UpdateMode,
}

pub struct CreateTlasDescriptor{
    pub max_instances: u32,
    pub flags: AccelerationStructureFlags,
    pub update_mode: UpdateMode,
}

// Secure instance entry for tlas
struct TlasInstance{
    transform: [f32; 12],
    custom_index: u32,
    mask: u8,
    shader_binding_table_record_offset: u32,
    flags: u8 //bitmap
    blas: Blas
}

impl Device {
    // Retrieves the size requirements for an acceleration structure.
    // BlasRequirements stores the BlasGeometrySizeDescriptors for validation (thats why we take ownership)
    // These descriptors are required for strict validation, because the underling (e.g. Vulkan) specifications doesn't
    // make many guaranties about the ordering of size requirements between different geometries.
    // By storing the sizes of all geometries we can validate that the different list of geometries is guarantied to fit.
    // If we would just query if the requirement are satisfied for the new geometries, it may fit on some systems and not others.  
    pub fn get_blas_size_requirements(&self, desc: &GetBlasRequirementsDescriptor, entries: BlasGeometrySizeDescriptors) -> BlasRequirements;
    
    // Creates a new bottom level accelerations structures and sets internal states for validation(reference to BlasGeometrySizeDescriptors)
    // and builds (e.g update mode)
    pub fn create_blas(&self, desc: &CreateBlasDescriptor) -> Blas;

    // Creates a new top level accelerations structures and sets internal states for builds (e.g update mode)
    pub fn create_tlas(&self, desc: &CreateTlasDescriptor) -> Tlas;
}

// Enum for the different types of geometries inside a single blas build
enum BlasGeometries<'a>{
    TriangleGeometries(&'a [BlasTriangleGeometry])
    ProceduralGeometries(&'a [BlasProceduralGeometry])
}

impl CommandEncoder {
    // Build multiple bottom level acceleration structures.
    // Validates that the geometries are guarantied to produce an acceleration structure that fits inside the allocated buffers (with strict validation).
    // Ensures that all used buffers are valid and synchronized.
    pub fn build_blas<'a>(&self, blas: impl IntoIterator<Item=&'a Blas>,
        triangle_geometries: impl IntoIterator<Item=&'a [BlasGeometries]>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Build multiple top level acceleration structures.
    // Validates the instances, (e.g. ensures that blas entries are valid and synchronized)
    // Uploads The part of the instances that changed in a staging buffer and 
    // enqueues a command to copy from that staging buffer into the internal index buffer.  
    // (Splitting building of bottom level and top level, makes validation easier). 
    pub fn build_tlas(&self, tlas: impl IntoIterator<Item=&'a Tlas>,
        instances: impl IntoIterator<Item=&'a TlasInstances>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Build multiple top level acceleration structures.
    // Uses the provided instance buffer directly (minimal validation).
    pub unsafe fn build_tlas_unsafe(&self, tlas: impl IntoIterator<Item=&'a Tlas>,
        raw_instances: impl IntoIterator<Item=&'a Buffer>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Creates a new blas and copies (in a compacting way) the contents of the provided blas
    // into the new one (compaction flag must be set). 
    pub fn compact_blas(&self, blas: &Blas) -> Blas;
}

impl BlasRequirements {
    // To use the same acceleration structure for multiple blas build (after each build we compact into a new one)
    // we need to allocate buffers big enough for all.
    // This function allows this, in a safe way (with strict validation enabled).
    pub fn find_smallest_shared_requirements(requirements: &[BlasRequirements]) -> BlasRequirements;

    // getter for the required scratch_buffer_size
    pub fn required_scratch_buffer_size() -> BufferAddress;
}

// trait on blas/tlas
trait AccelerationStructure {
    // modify flags before a build
    pub fn set_flags_mode(mode: AccelerationStructureFlags);
    // modify the update mode before a build
    pub fn set_update_mode(mode: UpdateMode);
    // getter for the required scratch_buffer_size
    pub fn required_scratch_buffer_size() -> BufferAddress;
}

// Safe Tlas Instance
impl TlasInstances{
    pub fn new(max_instances: u32) -> Self;

    // gets instances to read from
    pub fn get(&self) -> &[TlasInstance];
    // gets instances to modify, we keep track of the range to determine what we need to validate and copy
    pub fn get_mut_range(&mut self, range: Range<u32>) -> &mut [TlasInstance];
    // get the number of instances which will be build
    pub fn active(&self) -> u32;
    // set the number of instances which will be build
    pub fn set_active(&mut self, active: u32);
}

Previous discussion in a wgpu matrix room thread

@kvark
Copy link
Member

kvark commented Mar 26, 2023

Thank you for coming up with a concrete proposal! Is there any way we can reduce the API surface here? it would increase the chances for it to make into an WebGPU extension. For example, would it be reasonable to have the first version of this API not supporting the "update" operation for acceleration structures at all? Without the update, managing the scratch buffers may be simplified - just allocate it when creating an acceleration structure, and then free when it's built, all internally.

@JMS55
Copy link
Contributor

JMS55 commented Feb 25, 2024

  • It has to be a separate feature (sadly), as I don't think DX12/Metal support it either
  • TLAS is required to be built with the VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_DATA_ACCESS_KHR flag. wgpu/naga is going to have to be able to validate that when you get the vertex positions, the TLAS you traced against had that flag enabled. Maybe a new acceleration_structure_extended binding type, where naga only allows position fetches on that specific kind of TLAS, and then wgpu validates at bind time that you enabled the flag on the bound TLAS?
  • Rather than building it into the RayIntersection struct, it might make more sense to keep it as a separate function that maps directly to OpRayQueryGetIntersectionTriangleVertexPositionsKHR. Idk.

@Vecvec
Copy link
Contributor

Vecvec commented Feb 26, 2024

I'm surprised that it only is on vulkan, it feels like since they are already loading the positions it should be easy, however if it just works on vulkan then I probably will not implement it as on webgpu it says

A proposal for new functionality must be implementable on at least 2 different native APIs

@cwfitzgerald
Copy link
Member

@Vecvec I would note, that wgpu is willing to accept features that are only supported on a single api - the main thing is that if multiple apis support it, the proposal should allow implementation on all.

@Vecvec
Copy link
Contributor

Vecvec commented Feb 27, 2024

Thanks! I had assumed wgpu had similar proposal policies as webgpu.
In that case in response to @JMS55, your first idea sounds good but,

var acc_struct: acceleration_structure_extended;

may confuse people as to what feature / flag to enable for it though, instead how about

var<get_position> acc_struct: acceleration_structure;

(similar to uniform / storage buffers) or

var acc_struct: position_getting_acceleration_structure; 

similar to storage textures (a flag for any texture) instead.

In response to your second idea, I think that it is better than mine, because my idea would require all inputted acceleration structures to have the get_position flag and so would be too restrictive, how about

HitVertexPositions(rq: &ray_query) -> array<vec3f, 3>

for the function.

@JMS55
Copy link
Contributor

JMS55 commented Feb 27, 2024

var acc_struct: acceleration_structure_with_hit_position; or something similiar is probably best. var<T> has a specific meaning (uniform, storage, workgroup) related to memory that doesn't make sense for acceleration structure capabilities.

@cwfitzgerald
Copy link
Member

Would the type need to be different at all? Couldn't it be that if you access the hit position member, you need the capability?

@Vecvec
Copy link
Contributor

Vecvec commented Feb 27, 2024

The points in here are slightly disjointed but I didn't want to spam notifications

In response to @cwfitzgerald:
The type would not need to be different, however I think there will need to be some under the surface marker (for validating the required flag and feature are activated) so exposing it on the surface may make the difference (and feature required) more obvious to the user, while possibly making things easier to implement.

In response to @JMS55
I think that name is probably too annoying to type in, which could discourage users from using it, what about acceleration_structure<T>, storage textures use somthing like that for both the format and whether it is read only or not, and it would suggest to users that it could be used in the same way as acceleration_structure, and T could be something like vertex_return

Also, I had a look, and in dx12 I can't find any mention of a maximum array length, nor max bindings (both problems for just passing an array beside an acceleration structure in wgpu instead of this feature) so could this be polyfilled on dx12?

@Vecvec
Copy link
Contributor

Vecvec commented Mar 20, 2024

I've gotten a implementation working, with two exceptions, there is no enable in the shader, because the ray-tracing feature does not have an enable either, secondly I realized the function call needs to mention whether it is committed so I've changed the function name to getCommittedHitVertexPositions

@Vecvec
Copy link
Contributor

Vecvec commented Sep 14, 2024

I have been looking at the formats acceleration structure building can support, and for the base acceleration structure feature it appears that it should only support f32x3 because this is all metal supports.
From metal

Each vertex must have at least 12 bytes of position data, stored as a MTLPackedFloat3 containing the X, Y, and Z position.

There could be another feature that additionally provides f32x2, f16x2, f16x4, snorm16x2, snorm16x4 support which vulkan and dx12 support (and maybe unorm16x2, unorm16x4, rgb10a2, unorm8x2 unorm8x4, snorm8x2, snorm8x4 which dxr 1.1 - needed for ray query - supports and buffer features can be queried on vulkan), however additional format features can probably be post MVP if they are required.

From dx12:

Format of the vertices in VertexBuffer. Must be one of the following:
DXGI_FORMAT_R32G32_FLOAT - third component is assumed 0
DXGI_FORMAT_R32G32B32_FLOAT
DXGI_FORMAT_R16G16_FLOAT - third component is assumed 0
DXGI_FORMAT_R16G16B16A16_FLOAT - A16 component is ignored, other data can be packed there, such as setting vertex stride to 6 bytes.
DXGI_FORMAT_R16G16_SNORM - third component is assumed 0
DXGI_FORMAT_R16G16B16A16_SNORM - A16 component is ignored, other data can be packed there, such as setting vertex stride to 6 bytes.

Tier 1.1 devices support the following additional formats:
DXGI_FORMAT_R16G16B16A16_UNORM - A16 component is ignored, other data can be packed there, such as setting vertex stride to 6 bytes
DXGI_FORMAT_R16G16_UNORM - third component assumed 0
DXGI_FORMAT_R10G10B10A2_UNORM - A2 component is ignored, stride must be 4 bytes
DXGI_FORMAT_R8G8B8A8_UNORM - A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes
DXGI_FORMAT_R8G8_UNORM - third component assumed 0
DXGI_FORMAT_R8G8B8A8_SNORM - A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes
DXGI_FORMAT_R8G8_SNORM - third component assumed 0

From vulkan:

VK_FORMAT_FEATURE_ACCELERATION_STRUCTURE_VERTEX_BUFFER_BIT_KHR must be supported in bufferFeatures for the following formats if the accelerationStructure feature is supported:
VK_FORMAT_R32G32_SFLOAT
VK_FORMAT_R32G32B32_SFLOAT
VK_FORMAT_R16G16_SFLOAT
VK_FORMAT_R16G16B16A16_SFLOAT
VK_FORMAT_R16G16_SNORM
VK_FORMAT_R16G16B16A16_SNORM

@JMS55
Copy link
Contributor

JMS55 commented Sep 14, 2024

@Vecvec just f32x3 is good enough. I encourage you to start opening small PRs and getting an MVP merged - ray tracing is something I'm super excited about (for Bevy), and I'm happy that you're working on it, but it doesn't do much good sitting in a fork unfortunately :(. It's too difficult to maintain a forked version of Bevy.

@Vecvec Vecvec mentioned this issue Sep 18, 2024
6 tasks
@Jaisiero
Copy link

Jaisiero commented Nov 3, 2024

Hello guys. I saw that you support mostly all the inline ray query features. Although, have you had any advance on the ray tracing pipeline and shader binding table (SBT)?

Keep it up!

@Vecvec
Copy link
Contributor

Vecvec commented Nov 3, 2024

I've previously looked into adding naga support for a basic version of the ray-tracing pipeline (no callable shaders). I have a branch for it that works but I think it is lacking validation in certain parts (at least I haven't checked if it is lacking validation). It's probably quite out of date now, but some other pieces of work are taking priority right now, so in effect it's waiting on #6291.

@Jaisiero
Copy link

Jaisiero commented Nov 3, 2024

I've previously looked into adding naga support for a basic version of the ray-tracing pipeline (no callable shaders). I have a branch for it that works but I think it is lacking validation in certain parts (at least I haven't checked if it is lacking validation). It's probably quite out of date now, but some other pieces of work are taking priority right now, so in effect it's waiting on #6291.

Are GLSL shaders still supported in WGPU? I mean, is it possible to extend those RT features without naga supporting them? I ask these questions because I don't know what Naga's tasks are.

Cheers.

@Vecvec
Copy link
Contributor

Vecvec commented Nov 4, 2024

I don't think it would work because naga would have to convert the shader to the correct backend because no backend that supports GLSL also supports ray-tracing pipelines. WGPU also doesn't support the ray-tracing pipeline which makes the problem even harder. In "theory" you could convert the GLSL shader to spirv using spirv-cross (or something similar) and then code your own raytracing pipeline code using vulkan + wgpu-hal, but at that point it would probably be better to emulate it using software rt (still on the gpu).

@Jaisiero
Copy link

Jaisiero commented Nov 4, 2024

Thank you for your response Vecvec. I think I see the whole picture now. RT pipeline + shader stages + shading language functions, inline variables, ... sounds like a LOT.

Nevertheless, I am going to drop here a link about how important supporting accelerated by hardware ray tracing is becoming:
Steam 55% user GPUs support HW RT
It's a massive feature and I agre it needs to go step by step with a good planification.

I am trying to use the compute pipeline + inline ray tracing and directly write onto the swapchain texture and present it. Is it possible to do this currently in WGPU? I haven't seen an example showing this.

Best regards.

@JMS55
Copy link
Contributor

JMS55 commented Nov 4, 2024

I am trying to use the compute pipeline + inline ray tracing and directly write onto the swapchain texture and present it. Is it possible to do this currently in WGPU? I haven't seen an example showing this.

Not in current wgpu. There's an open PR (#6291) you can try. I'm not sure what the status of it is, but it's presumably blocked on either code changes or waiting for reviews atm.

@Jaisiero
Copy link

Jaisiero commented Nov 5, 2024

Hi @Vecvec,
I aware you this is gonna be a lot of text.

I forked your repo I've being working on trying to get procedural geometries to work for a couple of days by copying previous example (ray_cube_compute) but many validation errors (although those occur in any example I've tried so far, for instance, hello triangle) which prevent me to debug it using Renderdoc or Nvidia Nsight Graphics (they crash), then I am not sure if I am building the acceleration structures properly. No validation errors regarding cmd building accel structs are thrown though and vulkan API dump looks about right to me:

Thread 0, Frame 0:
vkCmdBuildAccelerationStructuresKHR(commandBuffer, infoCount, pInfos, ppBuildRangeInfos) returns void:
commandBuffer: VkCommandBuffer = 0000016F43316D10 []
infoCount: uint32_t = 1
pInfos: const VkAccelerationStructureBuildGeometryInfoKHR* = 00000029EA3D6918
pInfos[0]: const VkAccelerationStructureBuildGeometryInfoKHR = 00000029EA3D6918:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR (1000150000)
pNext: const void* = NULL
type: VkAccelerationStructureTypeKHR = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR (1)
flags: VkBuildAccelerationStructureFlagsKHR = 4 (VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
mode: VkBuildAccelerationStructureModeKHR = VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR (0)
srcAccelerationStructure: VkAccelerationStructureKHR = 0000000000000000
dstAccelerationStructure: VkAccelerationStructureKHR = 95A125000000001A
geometryCount: uint32_t = 1
pGeometries: const VkAccelerationStructureGeometryKHR* = 00000029EA3D65D0
pGeometries[0]: const VkAccelerationStructureGeometryKHR = 00000029EA3D65D0:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR (1000150006)
pNext: const void* = NULL
geometryType: VkGeometryTypeKHR = VK_GEOMETRY_TYPE_AABBS_KHR (1)
geometry: VkAccelerationStructureGeometryDataKHR = 00000029EA3D65E8 (Union):
triangles: VkAccelerationStructureGeometryTrianglesDataKHR = 00000029EA3D65E8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_AABBS_DATA_KHR (1000150003)
pNext: const void* = NULL
vertexFormat: VkFormat = UNKNOWN (266010880)
vertexData: VkDeviceOrHostAddressConstKHR = 00000029EA3D6600 (Union):
deviceAddress: VkDeviceAddress = 24
hostAddress: const void* = 0000000000000018
vertexStride: VkDeviceSize = 1
maxVertex: uint32_t = 1126709856
indexType: VkIndexType = UNKNOWN (367)
indexData: VkDeviceOrHostAddressConstKHR = 00000029EA3D6618 (Union):
deviceAddress: VkDeviceAddress = 180023557344
hostAddress: const void* = 00000029EA3D7CE0
transformData: VkDeviceOrHostAddressConstKHR = 00000029EA3D6620 (Union):
deviceAddress: VkDeviceAddress = 140701057833550
hostAddress: const void* = 00007FF784925A4E
aabbs: VkAccelerationStructureGeometryAabbsDataKHR = 00000029EA3D65E8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_AABBS_DATA_KHR (1000150003)
pNext: const void* = NULL
data: VkDeviceOrHostAddressConstKHR = 00000029EA3D65F8 (Union):
deviceAddress: VkDeviceAddress = 266010880
hostAddress: const void* = 000000000FDB0100
stride: VkDeviceSize = 24
instances: VkAccelerationStructureGeometryInstancesDataKHR = 00000029EA3D65E8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_AABBS_DATA_KHR (1000150003)
pNext: const void* = NULL
arrayOfPointers: VkBool32 = 266010880
data: VkDeviceOrHostAddressConstKHR = 00000029EA3D6600 (Union):
deviceAddress: VkDeviceAddress = 24
hostAddress: const void* = 0000000000000018
flags: VkGeometryFlagsKHR = 1 (VK_GEOMETRY_OPAQUE_BIT_KHR)
scratchData: VkDeviceOrHostAddressKHR = 00000029EA3D6960 (Union):
deviceAddress: VkDeviceAddress = 385875968
hostAddress: void* = 0000000017000000

Thread 0, Frame 0:
vkCmdBuildAccelerationStructuresKHR(commandBuffer, infoCount, pInfos, ppBuildRangeInfos) returns void:
commandBuffer: VkCommandBuffer = 0000016F43316D10 []
infoCount: uint32_t = 1
pInfos: const VkAccelerationStructureBuildGeometryInfoKHR* = 00000029EA3D69F8
pInfos[0]: const VkAccelerationStructureBuildGeometryInfoKHR = 00000029EA3D69F8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR (1000150000)
pNext: const void* = NULL
type: VkAccelerationStructureTypeKHR = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR (0)
flags: VkBuildAccelerationStructureFlagsKHR = 4 (VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR)
mode: VkBuildAccelerationStructureModeKHR = VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR (0)
srcAccelerationStructure: VkAccelerationStructureKHR = 0000000000000000
dstAccelerationStructure: VkAccelerationStructureKHR = 2CFBA2000000001C
geometryCount: uint32_t = 1
pGeometries: const VkAccelerationStructureGeometryKHR* = 00000029EA3D66B0
pGeometries[0]: const VkAccelerationStructureGeometryKHR = 00000029EA3D66B0:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR (1000150006)
pNext: const void* = NULL
geometryType: VkGeometryTypeKHR = VK_GEOMETRY_TYPE_INSTANCES_KHR (2)
geometry: VkAccelerationStructureGeometryDataKHR = 00000029EA3D66C8 (Union):
triangles: VkAccelerationStructureGeometryTrianglesDataKHR = 00000029EA3D66C8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR (1000150004)
pNext: const void* = NULL
vertexFormat: VkFormat = VK_FORMAT_UNDEFINED (0)
vertexData: VkDeviceOrHostAddressConstKHR = 00000029EA3D66E0 (Union):
deviceAddress: VkDeviceAddress = 266014720
hostAddress: const void* = 000000000FDB1000
vertexStride: VkDeviceSize = 1577420693352
maxVertex: uint32_t = 1
indexType: VkIndexType = VK_INDEX_TYPE_UINT16 (0)
indexData: VkDeviceOrHostAddressConstKHR = 00000029EA3D66F8 (Union):
deviceAddress: VkDeviceAddress = 1577412308992
hostAddress: const void* = 0000016F4519B000
transformData: VkDeviceOrHostAddressConstKHR = 00000029EA3D6700 (Union):
deviceAddress: VkDeviceAddress = 8384360
hostAddress: const void* = 00000000007FEF68
aabbs: VkAccelerationStructureGeometryAabbsDataKHR = 00000029EA3D66C8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR (1000150004)
pNext: const void* = NULL
data: VkDeviceOrHostAddressConstKHR = 00000029EA3D66D8 (Union):
deviceAddress: VkDeviceAddress = 18446744069414584320
hostAddress: const void* = FFFFFFFF00000000
stride: VkDeviceSize = 266014720
instances: VkAccelerationStructureGeometryInstancesDataKHR = 00000029EA3D66C8:
sType: VkStructureType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_INSTANCES_DATA_KHR (1000150004)
pNext: const void* = NULL
arrayOfPointers: VkBool32 = 0
data: VkDeviceOrHostAddressConstKHR = 00000029EA3D66E0 (Union):
deviceAddress: VkDeviceAddress = 266014720
hostAddress: const void* = 000000000FDB1000
flags: VkGeometryFlagsKHR = 0
scratchData: VkDeviceOrHostAddressKHR = 00000029EA3D6A40 (Union):
deviceAddress: VkDeviceAddress = 385875968
hostAddress: void* = 0000000017000000

I copied some code from another example I have something like this:

fn query_loop(pos: vec3<f32>, dir: vec3<f32>, acs: acceleration_structure) -> RayIntersection {
    var rq: ray_query;
    rayQueryInitialize(&rq, acs, RayDesc(RAY_FLAG_TERMINATE_ON_FIRST_HIT, 0xFFu, 0.1, 100.0, pos, dir));

    while (rayQueryProceed(&rq)) {}

    return rayQueryGetCommittedIntersection(&rq);
}

//main func here
{
    // [...]
    if (intersection.kind != RAY_QUERY_INTERSECTION_NONE) {
        color = vec4<f32>(1.0, 1.0, 1.0, 1.0);
    }

    textureStore(output, global_id.xy, color);
}

but nothing was intersected.

Nonetheless I think inline ray query in WGSL is not prepared for this yet. This is the code that I'd like to mimic using WGSL in a compute shader like the following in GLSL:

rayQueryEXT rayQuery;
rayQueryInitializeEXT(
    rayQuery, tlas,
    rayFlags,
    cullMask, // cullMask
    origin.xyz, tMin, direction.xyz, tMax);

while (rayQueryProceedEXT(rayQuery))
{
    uint type = rayQueryGetIntersectionTypeEXT(rayQuery, false);
    if (type == gl_RayQueryCandidateIntersectionAABBEXT)
    {
        rayQueryGenerateIntersectionEXT(rayQuery, tMax);
    }
}

uint type = rayQueryGetIntersectionTypeEXT(rayQuery, true);

if (type == gl_RayQueryCommittedIntersectionGeneratedEXT)
{
    color = vec3(1.0, 1.0, 1.0);
}

I hope I didn't overwelmed you too much and you could point it out how to go forward.

@Vecvec
Copy link
Contributor

Vecvec commented Nov 5, 2024

First of all procedural geometry is not intended for the basic version of the acceleration structures, while these are very important there are three reasons for them not being in the first

  • Triangles are what rasterizers supported, so most applications should be able to just use triangles.
  • The PR is over 10,000 lines of code with just triangles.
  • Naga does not support procedural geometry, so it would be hard to test, requiring spirv pass-through.

I'm not sure why N-Sight doesn't work, I've used it lots to debug ray-tracing. RenderDoc (as of my knowledge) does not yet support debugging raytracing, and so anything run on it will cause an error because it will not show the ray-tracing feature.

It says on your vulkan trace VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_AABBS_DATA_KHR, how did you manage to do that? I can't find anything that would lead to that from the currently implemented code.

@Jaisiero
Copy link

Jaisiero commented Nov 5, 2024

Thank you for answering! I was worried to hear that.
Yeah, you won't find that code because as I said I fork your current work and I added it. Many of us are voxel enthusiast, and even you can do them with triangles, for ray tracing, I feel procedural geometries fit more naturally.

I could PR but I though it would be overwelming right for the first attempt as you point out. Branch's in my github.

Here is the error that I got at the begining of every example and I think is related to this thread:
#5379
Weirdly it doesn't crash when I run them using cargo but it does when using Nsight.

Anyway, I am not sure if I would be able to extend Naga to support proc geometry, I have no experience doing such task.
Do you think I could bypass Naga and compile GLSL shaders for my example case? If affirmative, I am not sure how to do that.

@Vecvec
Copy link
Contributor

Vecvec commented Nov 5, 2024

You would likely want spirv passthrough feature + a compiler to spirv or this.

also you seem to have used my get-vertex branch, the branch that is up to date with trunk is ray-tracing-new, which may have fixed some bugs. The get vertex branch is for additional triangle features which I suspect you do not need

@Jaisiero
Copy link

Jaisiero commented Nov 5, 2024

You would likely want spirv passthrough feature + a compiler to spirv or this.

also you seem to have used my get-vertex branch, the branch that is up to date with trunk is ray-tracing-new, which may have fixed some bugs. The get vertex branch is for additional triangle features which I suspect you do not need

Alright. I did the following: I compiled my shader using glslangvalidator and then I loaded it like this:

let shader = device.create_shader_module(wgpu::include_spirv!("shader.comp.spv"));

As a result I got this errors:

[2024-11-05T07:59:23Z ERROR wgpu_core::device::global] Device::create_shader_module error:
Shader 'shader.comp.spv' parsing error: UnsupportedCapability(RayQueryKHR)

[2024-11-05T07:59:23Z ERROR wgpu::backend::wgpu_core] Handling wgpu errors as fatal by default
thread 'main' panicked at wgpu\src\backend\wgpu_core.rs:3652:5:
wgpu error: Validation Error

Caused by:
In Device::create_shader_module, label = 'shader.comp.spv'

Shader 'shader.comp.spv' parsing error: UnsupportedCapability(RayQueryKHR)

  unsupported capability RayQueryKHR

I am afraid I didn't understand what I supposed to do to in order to passthrough unsupported features.

@Vecvec
Copy link
Contributor

Vecvec commented Nov 5, 2024

No, that is an error by naga saying it doesn't support reading that feature in your shader, all normal shaders go through naga to convert them to IR (intermediary representation). Shader passthrough is separate from create_shader_module using wgpu::Features::SPIRV_SHADER_PASSTHROUGH and a call such as device.create_shader_module_spirv(wgpu::include_spirv_raw!("shader.comp.spv")), this is unsafe because there is not way to validate this w/o naga, but any spirv generated by glslangvalidator is valid.

It's worth noting this limits you to vulkan, but currently Ray-tracing is only supported there.

@Jaisiero
Copy link

Jaisiero commented Nov 5, 2024

I could go forward but now I am stuck at matching resources between descriptor layout and shader bindings but I got some errors:

[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-VkComputePipelineCreateInfo-layout-07988 (0xd7bf5790)]
Validation Error: [ VUID-VkComputePipelineCreateInfo-layout-07988 ] Object 0: handle = 0xcfcda0000000001e, name = shader.comp.spv, type = VK_OBJECT_TYPE_SHADER_MODULE; Object 1: handle = 0x2e2941000000001f, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0xd7bf5790 | vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V (VK_SHADER_STAGE_COMPUTE_BIT) uses descriptor [Set 0, Binding 2, variable "acc_struct"] (type VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR) but was not declared in the pipeline layout.
The Vulkan spec states: If a resource variables is declared in a shader, a descriptor slot in layout must match the shader stage (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-VkComputePipelineCreateInfo-layout-07988)
[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] objects: (type: SHADER_MODULE, hndl: 0xcfcda0000000001e, name: shader.comp.spv), (type: PIPELINE_LAYOUT, hndl: 0x2e2941000000001f, name: ?)
[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-VkComputePipelineCreateInfo-layout-07988 (0xd7bf5790)]
Validation Error: [ VUID-VkComputePipelineCreateInfo-layout-07988 ] Object 0: handle = 0xcfcda0000000001e, name = shader.comp.spv, type = VK_OBJECT_TYPE_SHADER_MODULE; Object 1: handle = 0x2e2941000000001f, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0xd7bf5790 | vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V (VK_SHADER_STAGE_COMPUTE_BIT) uses descriptor [Set 0, Binding 0, variable "output_image"] (type VK_DESCRIPTOR_TYPE_STORAGE_IMAGE) but was not declared in the pipeline layout.
The Vulkan spec states: If a resource variables is declared in a shader, a descriptor slot in layout must match the shader stage (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-VkComputePipelineCreateInfo-layout-07988)
[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] objects: (type: SHADER_MODULE, hndl: 0xcfcda0000000001e, name: shader.comp.spv), (type: PIPELINE_LAYOUT, hndl: 0x2e2941000000001f, name: ?)
[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-VkComputePipelineCreateInfo-layout-07988 (0xd7bf5790)]
Validation Error: [ VUID-VkComputePipelineCreateInfo-layout-07988 ] Object 0: handle = 0xcfcda0000000001e, name = shader.comp.spv, type = VK_OBJECT_TYPE_SHADER_MODULE; Object 1: handle = 0x2e2941000000001f, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0xd7bf5790 | vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V (VK_SHADER_STAGE_COMPUTE_BIT) uses descriptor [Set 0, Binding 1, variable "uniforms"] (type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER or VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC or VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK) but was not declared in the pipeline layout.
The Vulkan spec states: If a resource variables is declared in a shader, a descriptor slot in layout must match the shader stage (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-VkComputePipelineCreateInfo-layout-07988)
[2024-11-05T19:50:33Z ERROR wgpu_hal::vulkan::instance] objects: (type: SHADER_MODULE, hndl: 0xcfcda0000000001e, name: shader.comp.spv), (type: PIPELINE_LAYOUT, hndl: 0x2e2941000000001f, name: ?)

This is rust (vulkan descriptors):

        let shader;
        // Creating a shader module spirv should fail.
        unsafe {
            shader = device.create_shader_module_spirv(&wgpu::include_spirv_raw!("shader.comp.spv"));
        }
        
        let compute_pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
            label: Some("rt"),
            layout: None,     // <--- should I create a bind group layout previously and pass it as a parameter here?
            module: &shader,
            entry_point: Some("main"),
            compilation_options: Default::default(),
            cache: None,
        });

        let compute_bind_group_layout = compute_pipeline.get_bind_group_layout(0); // <-- it throws an error here

        let compute_bind_group = device.create_bind_group(&wgpu::BindGroupDescriptor {
            label: None,
            layout: &compute_bind_group_layout,
            entries: &[
                wgpu::BindGroupEntry {
                    binding: 0,
                    resource: wgpu::BindingResource::TextureView(&rt_view),
                },
                wgpu::BindGroupEntry {
                    binding: 1,
                    resource: uniform_buf.as_entire_binding(),
                },
                wgpu::BindGroupEntry {
                    binding: 2,
                    resource: wgpu::BindingResource::AccelerationStructure(&tlas),
                },
            ],
        });

This is WGSL code which works (for triangle version):

@group(0) @binding(0)
var output: texture_storage_2d<rgba8unorm, write>;

@group(0) @binding(1)
var<uniform> uniforms: Uniforms;

@group(0) @binding(2)
var acc_struct: acceleration_structure;

This is my attemp of GLSL which should work:

// Output image at binding 0
layout(set = 0, binding = 0, rgba8) uniform image2D output_image;

// Uniform buffer for inverse view/projection matrices at binding 1
layout(set = 0, binding = 1) uniform Uniforms {
    mat4 view_inv;
    mat4 proj_inv;
} uniforms;

// Acceleration structure at binding 2
layout(set = 0, binding = 2) uniform accelerationStructureEXT acc_struct;

I think wgpu does many things for me when we are using wgsl as a language, like creating and managing descriptors. I am sorry I am asking here many questions but here it goes: Do I need to create those descriptor structs by hand?

Thank you for helping.

@Vecvec
Copy link
Contributor

Vecvec commented Nov 5, 2024

I think naga auto-generates reflection info, allowing you to have an implict pipeline layout, since this bypasses naga there may be no reflection information. This means you should provide an explicit pipeline layout (in ComputePipelineDescriptor layout).

@Jaisiero
Copy link

Jaisiero commented Nov 6, 2024

image
Got it!

@Jaisiero
Copy link

Jaisiero commented Nov 8, 2024

Hi @Vecvec. I want to PR my little contribution. Should I try to do it to the RT branch or yours?

@JMS55
Copy link
Contributor

JMS55 commented Nov 8, 2024

I can't speak for @Vecvec, but imo we shouldn't add more stuff to the existing PR. It's already a large PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API surface help required We need community help to make this happen. type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests