Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser WebGPU? #46

Open
mighdoll opened this issue Jan 18, 2024 · 33 comments
Open

Browser WebGPU? #46

mighdoll opened this issue Jan 18, 2024 · 33 comments
Assignees
Labels
bug Something isn't working feature New feature or request

Comments

@mighdoll
Copy link

Is support for WebGPU development on your radar?

@miguel-petersen
Copy link
Collaborator

Hi Mighdoll.

At this moment WebGPU is not being considered for support.

@miguel-petersen miguel-petersen added the question Further information is requested label Jan 19, 2024
@miguel-petersen
Copy link
Collaborator

To clarify, we are currently not seeking official support.

However, if WebGPU translates down to either DX12 or Vulkan (apologies, I am not too familiar), then GPU Reshape is able to hook into it.

@mighdoll
Copy link
Author

Yep, I think there'd be hope that GPU-Reshape could hook in when running a browser on the the right platform. In WebGPU land, we're eager for GPU tool support!

Here's a bit about chrome: https://chromium.googlesource.com/chromium/src/+/main/docs/security/research/graphics/webgpu_technical_report.md
and firefox: https://github.com/gfx-rs/wgpu

@miguel-petersen
Copy link
Collaborator

I took a quick look, and I am able to "attach" if I launch from Reshape, though no other chrome instances may be running before.

image

Specifically, I launch from Reshape with (provided by an AMD fellow!):
--disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming

However, something is broken in the rendering. I don't think it's actually presenting, particularly also because during presents some data is sent back to the app, and I get nothing.

image

@miguel-petersen
Copy link
Collaborator

miguel-petersen commented Jan 21, 2024

One thing about Chrome is that sub-processes are spawned with process mitigation policies.
https://github.com/chromium/chromium/blob/b119cd4f3bf59a6b58553420741713a88b5325eb/sandbox/win/src/process_mitigations.cc#L460

Which I check against here.
https://github.com/GPUOpen-Tools/GPU-Reshape/blob/main/Source/Backends/DX12/Bootstrapper/Source/DLL.cpp#L795

Chrome's sandboxing enables it. If a process enables either mitigation policy, I cannot inject my bootstrapper. I wonder how PIX (I heard it can do it?) handles it, I could tamper with the creation parameters, but I'm not sure if that's the right way forward, and could be seen as malicious, maybe.

Chrome does have a --no-sandbox parameter which avoids it, but then Reshape fails to discover any device on newly spawned tabs (new processes). Strange lands.

@miguel-petersen
Copy link
Collaborator

As a repro case, this is what I'm doing.

Reshape launch parameters:
App: C:\Program Files (x86)\Google\Chrome\Application\chrome.exe
Cwd: C:\Program Files (x86)\Google\Chrome\Application
Args: --disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming --no-sandbox https://webgpu.github.io/webgpu-samples/samples/helloTriangle

It does launch, and connect to a device, but something breaks somewhere. This'll be an interesting one to debug!

image

@miguel-petersen
Copy link
Collaborator

From a brief investigation it appears that a command list is failing to close, likely indicating a validation error somewhere.

It'd be nice if it's a quick fix.

@mighdoll
Copy link
Author

I posted a ref to this bug over on: https://matrix.to/#/#webgpu-dawn:matrix.org. I recommend dropping in over there if you have questions about Chrome/Dawn!

@miguel-petersen
Copy link
Collaborator

I joined the room 🙂

Managed to get this validation error out of Dawn with Reshape, probably what's causing the command list to fault.
D3D12 ERROR: ID3D12GraphicsCommandList::CopyBufferRegion: Invalid Command List method (CopyBufferRegion) called within a Render Pass. [ EXECUTION ERROR #1203: RENDER_PASS_DISALLOWED_API_CALLED]

@kainino0x
Copy link

kainino0x commented Jan 22, 2024

GPU work in chrome is all done in a single GPU subprocess. That process definitely needs sandboxing disabled to debug it.
Here are instructions on how to get it working with PIX:
https://gist.github.com/Popov72/41f71cbf8d55f2cb8cae93f439eee347
(The flags are the same as the ones you mentioned.)

That's likely the best option but if any problems are caused by launching multiple processes, there are other options:
https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/debugging_gpu_related_code.md#debugging-in-the-gpu-process

Another option to incrementally investigate this would be to try debugging Dawn samples, though unfortunately you would have to build them from Dawn. If that error is coming from the D3D12 debug layer - both Chrome and Dawn should run cleanly against it, but Chrome has a lot more going on, so running just Dawn would narrow down where it's coming from. There are also ways to enable the D3D12 debug layer for Dawn (I think the flags are: when launching chrome, --enable-dawn-backend-validation; when launching dawn samples/tests, --enable-backend-validation; but tell me if those don't work)

miguel-petersen added a commit that referenced this issue Jan 22, 2024
…ct it

- Does not account for maintaining behaviour of before / after access states yet
- Added render pass stream states
- Added BeginRenderPass / EndRenderPass hooks
- Added additional logging to bootstrapper

#46
@miguel-petersen
Copy link
Collaborator

miguel-petersen commented Jan 22, 2024

As of d46430d it seems to render properly, however, does not send any data back yet.

Please note that the change is incomplete as I need to carefully manage the before / after access states of each render target and depth stencil, something I need a little more time to think about. Currently it just blindly reconstructs the render pass, which is incorrect.

image

@miguel-petersen
Copy link
Collaborator

@kainino0x Absolute pleasure to have a direct contributor here, thanks for the wealth of information!

Regarding D3D11On12, would you know how (Chrome) WebGPU utilizes it? Reshape does support hooking D3D11On12, however, it currently doesn't do much with it. Is it somehow involved with presentation?

Particularly on presentation, Reshape doesn't seem to hit hit any hooks, so I'm very much curious how that happens.

@miguel-petersen
Copy link
Collaborator

The issue is definitely regarding presentation. Currently Reshape sends data back during presentation, as I'm not hitting that hook nothing ever gets sent.

If I add a dummy thread to pump out data manually, I can get Reshape communicating. This is a change I've been meaning to do anyway, so I'll track it here.

image

Instrumentation seems to do its job as well. Though I am having troubles getting debug sources working. 🤔

image

@miguel-petersen miguel-petersen self-assigned this Jan 22, 2024
@miguel-petersen miguel-petersen added bug Something isn't working feature New feature or request and removed question Further information is requested labels Jan 22, 2024
@kainino0x
Copy link

I am pretty sure we are not using 11on12, but instead doing interop between native D3D11 (Chrome) and D3D12 (Dawn) but I don't know how that interop works.

If that's a problem, then I can check if the Dawn-D3D12 backend for Chrome compositing is working and how to switch it on. Then I think everything is supposed to go through D3D12.

@kainino0x
Copy link

Detecting frame boundaries has historically always been a problem with using Chrome with graphics debuggers, because Chrome's presentation is so complex. There might also be some option that injects a fake "swap" to tell debuggers where the frame boundaries are.

@miguel-petersen
Copy link
Collaborator

I see, while supporting D3D11 is not on the roadmap, I would consider hooking "just enough" to be able to detect frame boundaries. Useful for a many reasons.

If there's a switch to turn on native compositing, that would be a great way forward for the short term. 🙂

@kainino0x
Copy link

Chrome has a flag --use-angle=d3d11on12 which will use 11on12 for ANGLE. By default everything in Chrome should be going through either ANGLE or Dawn so theoretically that should make it use 12 exclusively.

If you have a chance let me know if that works, or maybe I can find a chance to try it myself and play with the flags.
(Note: please use Chrome Canary, as I don't know the state of things in the Chrome release branches)

If that's a problem, then I can check if the Dawn-D3D12 backend for Chrome compositing is working and how to switch it on. Then I think everything is supposed to go through D3D12.

Turns out this is "very experimental" right now. I tried it and WebGL and WebGPU content didn't work at all. So not yet.

@miguel-petersen
Copy link
Collaborator

I've been testing on Chrome release, I'll see if I can't use Canary instead.

Turns out this is "very experimental" right now. I tried it and WebGL and WebGPU content didn't work at all. So not yet.

Gotcha. I'll see if I can't hook the presentation method somehow.

@miguel-petersen
Copy link
Collaborator

The canary branch seems to solve the symbol issue, which is great news.

image

@miguel-petersen
Copy link
Collaborator

Pretty happy with local performance. I've got one local change I need to think about, it's regarding how data is sent back to the app, just need to make sure I'm not introducing problems later on.

WGPU.mp4

@miguel-petersen
Copy link
Collaborator

Hi, quite a few changes have landed in development, and a couple more after GDC. Things should work much more smoothly now.

This includes lots of crash fixes and cases where Reshape was not preserving the original behaviour. And, most importantly, the pooling mechanism, which now happens on a controlled interval instead of during presentation. There's a number of benefits to this, but for chrome usage it removes the need to track presentation at all.

Another nice thing is that Reshape now supports hooking sub-processes and multiple devices, I find this super useful for chrome development. You can launch new tabs, reload examples, etc. and it "should just work". Just check the two checkboxes below.

image

Whenever the app / chrome creates a device, it'll appear in the list. To open its associated workspace, double click any of them.

image

Currently they don't auto delete, so the list might expand considerably as chrome's creating devices. Something to think about.

It'd be great if someone has a second to try it out, and see if they spot any issues with the current setup.

@kainino0x
Copy link

Very nice! I will ask the WebGPU matrix chat room if anyone wants to try this out.

@kainino0x
Copy link

kainino0x commented Mar 25, 2024

Here are the Chromium flags again, for reference:
--disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming

@mighdoll
Copy link
Author

Awesome! I'll rebuild my windows machine to try gpu-reshape!

Hmm.. Need I buy an AMD card right away, or is my old 1080 nvidia card ok for now with GPU-reshape?

@miguel-petersen
Copy link
Collaborator

miguel-petersen commented Mar 26, 2024

Hey @mighdoll! Reshape supports NVIDIA just fine, earliest model I tested was a GTX 970. That said, if you find anything let's fix it. 🙂

Also, just to reiterate, the relevant branch is now https://github.com/GPUOpen-Tools/GPU-Reshape/tree/development

@mighdoll
Copy link
Author

Okay! I built the development branch and ran Reshape successfully on chrome canary with an nvidia 1080 card.

It's neato to see the generated hlsl and dxil for the shaders, and one of my old wgsl experiments generated three warnings (uninitialized resource read - twice, and texture read out of bounds) so I can see Reshape will quickly be useful.

A few things I noticed:

  • I can launch the browser with the suggested flags from within Reshape, then navigate to a page with a demo and it finds the shaders as it loads. Very nice!
  • Switching pages or reloading after Reshape has found some shaders doesn't seem to work, and usually reports lost connection. I thought it worked sometimes, but , so maybe it sometimes works? For example I tried switching between the examples on the webgpu-samples page.

Awesome work @miguel-petersen!

Let me know if you want me to collect any logging as I experiment.

@miguel-petersen
Copy link
Collaborator

Hey! Glad to hear things ran on your end. 🙂

On the switching of pages / reloading, chances are that the underlying device is destroyed at that point. It's up to chrome when devices are recreated, I guess it sometimes shares it, and sometimes not? Currently the lifetime of all the internal data, and an internal server, is tied to the underlying device. There's an interesting question here, if it should persist beyond that, maybe if Reshape (app side) is connected.

With "Attach All Devices" the new device should appear automatically in the workspace tree, already hooked.

If you come across any false positives, or general issues, feel free drop them here! Happy to fix them.

@kainino0x
Copy link

uninitialized resource read - twice

I don't think this should happen - Dawn is supposed to make sure all resources are initialized before they can be read, for security reasons. If those look like true-positives could you please file a Dawn bug (https://crbug.com/dawn) about them?

@miguel-petersen
Copy link
Collaborator

Would it be possible to know how Dawn initializes resources @kainino0x ? It's likely I'm just missing to hook a path.

@kainino0x
Copy link

@austinEng would know better

@austinEng
Copy link

austinEng commented Apr 3, 2024

There are a few paths:

  1. render pass with LoadOp::Clear and LoadOp::Store
  2. Using the builtin "clear" command
    e.g. ID3D12GraphicsCommandList::ClearRenderTargetView / ID3D12GraphicsCommandList::ClearDepthStencilView / vkCmdClearDepthStencilImage / vkCmdFillBufer / MTLBlitCommandEncoder::fillBuffer
  3. buffer-to-buffer / buffer-to-texture copy from a buffer filled with 0s

@miguel-petersen
Copy link
Collaborator

Thanks Austin.

All of those paths should be hooked, so I wonder what's happening. Given that Dawn initializes all resources by default, I'll see if I can reproduce it in a public sample.

@miguel-petersen
Copy link
Collaborator

Got some time to see what was happening.

After a little investigation it seems that most samples go through a custom render pass which manually copies the texels, optionally with some color transformation.

What took me some time to understand is why Reshape didn't catch the initialization event of the source resources, until I saw that you are using OpenSharedHandle for some objects @austinEng. Tracking initialization events across, potentially, processes, is beyond the scope of Reshape. With that, I opted to mark all resources created from external handles as initialized from creation.

It'll get submitted to a branch that's not ready quite yet, but likely later in May. The initialization feature is getting reworked to track initialization states on a per-texel basis, instead of the whole resource. Same for concurrency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants