Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-Process example #5648

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

Conversation

enzofrancescaHM
Copy link

Description:
Minimal example to show the possibility of implementing Post-Processing in A-Frame.
NOTE: for Post-Processing to work also in VR mode, supermedium/three.js#20 must be implemented in supermedium three.
Changes proposed:
-index.html running a simple scene with simple geometries with and without emission
-bloom.js, a minimal implementation of Bloom effect

@vincentfretin
Copy link
Contributor

Where does all those 1087 lines of code comes from?
You should be able to import the postprocessing library with an importmap, I advice you to base your example on the new importmap example https://github.com/aframevr/aframe/blob/master/examples/boilerplate/importmap/index.html and write a small bloom component of a few lines.
The bloom effect integration in r3f is just this https://github.com/pmndrs/react-postprocessing/blob/master/src/effects/Bloom.tsx

@enzofrancescaHM
Copy link
Author

Where does all those 1087 lines of code comes from? You should be able to import the postprocessing library with an importmap, I advice you to base your example on the new importmap example https://github.com/aframevr/aframe/blob/master/examples/boilerplate/importmap/index.html and write a small bloom component of a few lines. The bloom effect integration in r3f is just this https://github.com/pmndrs/react-postprocessing/blob/master/src/effects/Bloom.tsx

Yes, thanks for pointing that, I completely missed the new feature of A-Frame to use importmap. Sorry for that, I've reduced the bloom.js to the bare minimum to work.

@vincentfretin
Copy link
Contributor

Yes, thanks for pointing that, I completely missed the new feature of A-Frame to use importmap

That new importmap example exist since 2 days ago ;) I'm glad I did it so we can have a simpler example here without copying all the code.

@vincentfretin
Copy link
Contributor

Didn't we agree from supermedium/three.js#20 to use
https://github.com/pmndrs/postprocessing that is more performant than the three/addons effect composer?

@dmarcos
Copy link
Member

dmarcos commented Jan 30, 2025

Didn't we agree from supermedium/three.js#20 to use https://github.com/pmndrs/postprocessing that is more performant than the three/addons effect composer?

Whatever yields 90fps with the simpler code and least amount of dependencies

@enzofrancescaHM
Copy link
Author

Didn't we agree from supermedium/three.js#20 to use https://github.com/pmndrs/postprocessing that is more performant than the three/addons effect composer?

pmndrs does not work in VR at the moment. I have opened tickets there, but at the moment no one is working on it.

@dmarcos
Copy link
Member

dmarcos commented Jan 31, 2025

I merged THREE changes and updated A-Frame so should work on top of master.

Can you put the examples under showcase and rename to post-processing: showcase/post-processing?

Thanks so much for all the effort

this.scene = this.el.object3D;
this.renderer = this.el.renderer;
this.camera = this.el.camera;
this.composer = new EffectComposer(this.renderer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of recreating EffectComposer, RenderPass and UnrealBloomPass each update, they can be created once in init and only updated here.

},
bind: function () {
const render = this.renderer.render;
const system = this;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this isn't a system, probably better to name it self.

* Unreal Bloom Effect
* Implementation for A-Frame
* Code modified from Akbartus's UnrealBloomPass.js
* https://github.com/akbartus/A-Frame-Component-Postprocessing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the source is MIT licensed, the original copyright and license notice should be included.

"imports": {
"aframe": "../../dist/aframe-master.module.min.js",
"three": "https://cdn.jsdelivr.net/npm/[email protected]/build/three.module.js",
"three/addons/": "https://cdn.jsdelivr.net/npm/[email protected]/examples/jsm/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be

          "three": "../../../super-three-package/build/three.module.js",
          "three/addons/": "../../../super-three-package/examples/jsm/",

like https://github.com/aframevr/aframe/blob/master/examples/boilerplate/importmap/index.html
so it uses the three version from node_modules.

shadow="type: pcfsoft; autoUpdate: true"
background="color:black;"
renderer="anisotropy:4; stencil:true; alpha:false; colorManagement:true; exposure:1.0;"
bloomm="threshold: 1.0; strength: 0.6; radius: 1; exposure: 1.0">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the example still work? there is two m here.

@enzofrancescaHM
Copy link
Author

Another 2 cents:
similar scene I've just built on playcanvas: https://playcanv.as/b/b2b8b290 similar perfs on Quest3 (if not worse)
generic bloom scene in wonderland: https://wonderlandengine.com/showcase/postprocessing/ (VR is broken completely)

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

@vincentfretin Thanks so much for confirming the numbers

@enzofrancescaHM Thanks for the patience and all the hard work. This is pretty close. Can we incorporate your car example to this PR (2-3 lights up to you)?

We can improve after merge. I’m super happy to have something we can iterate over.

@enzofrancescaHM
Copy link
Author

OK, done. Css is local, model is in te model folder of A-Frame Examples

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

OK, done. Css is local, model is in te model folder of A-Frame Examples

Thanks. Can you submit the model into the assets repo? https://github.com/aframevr/assets/tree/master/examples create a directory for the example under ‘examples’

@enzofrancescaHM
Copy link
Author

done.

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

The car is available now via CDN:

https://cdn.aframe.io/examples/post-processing/fancy-car.glb

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

For model credit and other instructions we use a "standard" info panel:

https://aframe.io/aframe/examples/showcase/comicbook/

You can just import the info-message component and use it on the scene

@@ -158,6 +158,7 @@ <h2>Examples</h2>
<li><a href="mixed-reality/anchor/">Anchor (Mixed Reality)</a></li>
<li><a href="mixed-reality/real-world-meshing/">Real World Meshing (Mixed Reality)</a></li>
<li><a href="boilerplate/importmap/">Importmap (import teapot geometry from three/addons)</a></li>
<li><a href="showcase/post-processing/">Post-Processing (bloom effect)</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add (bloom effect) for simplicity

@mrxz
Copy link
Contributor

mrxz commented Feb 5, 2025

Here's a proof-of-concept of a Bloom effect mostly hitting 90fps on a Quest 2: https://thrilling-alkaline-headline.glitch.me/
Note that the output is not visually identical to the UnrealBloomPass and there are some additional caveats.

This is not meant to further complicate or delay this PR. Just curious if it was doable and with what trade-offs, as that might help determine if we could/should pursue it. Even with the current performance implications this PR can be merged IMHO. In fact, the UnrealBloomEffect of Three.js is already used quite often with 8thwall. Even though the implementation isn't optimal for mobile GPUs it tends to work good enough for small, focussed experiences for handheld AR.

As @dmarcos said, once merged we can iterate over it. If post-processing ever becomes a core feature, then we should be a lot more careful.

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

Here's a proof-of-concept of a Bloom effect mostly hitting 90fps on a Quest 2: https://thrilling-alkaline-headline.glitch.me/ Note that the output is not visually identical to the UnrealBloomPass and there are some additional caveats.

This is not meant to further complicate or delay this PR. Just curious if it was doable and with what trade-offs, as that might help determine if we could/should pursue it. Even with the current performance implications this PR can be merged IMHO. In fact, the UnrealBloomEffect of Three.js is already used quite often with 8thwall. Even though the implementation isn't optimal for mobile GPUs it tends to work good enough for small, focussed experiences for handheld AR.

As @dmarcos said, once merged we can iterate over it. If post-processing ever becomes a core feature, then we should be a lot more careful.

Any improvements you can suggest on this PR for Quest2? Two paths for Quest 2 and 3 is acceptable. Thanks!

@enzofrancescaHM
Copy link
Author

Wow, amazing work @mrxz ! Yes, as you were saying the effect seems a little bit cheaper than the one on this PR, but it is super acceptable and nice. Maybe two paths with different levels of quality / performance are the right way. And, I agree that we should merge and than iterate over it.

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

Wow, amazing work @mrxz ! Yes, as you were saying the effect seems a little bit cheaper than the one on this PR, but it is super acceptable and nice. Maybe two paths with different levels of quality / performance are the right way. And, I agree that we should merge and than iterate over it.

Haven't tried in VR. Is the "more expensive" one noticeably better visually? If yes two paths is fine. If not maybe we can just have the cheaper one.

@enzofrancescaHM
Copy link
Author

Screenshot 2025-02-05 alle 21 16 36

@mrxz
Copy link
Contributor

mrxz commented Feb 5, 2025

The UnrealBloomPass uses multiple different Gaussian blur radii and combines them. Instead I only approximate one (large) blur and composite it, similar to what PlayCanvas is doing. The larger the intended bloom radius, the more noticeable the difference will be.

But it's actually possible to get closer while still (though barely) maintaining 90fps on Quest 2. See: https://thrilling-alkaline-headline.glitch.me/index2.html

This PR Old New
image image image

But it becomes questionable how useful a post processing effect is that eats up practically all your rendering budget.

Any improvements you can suggest on this PR for Quest2?

The biggest gains are in reducing the render targets used while rendering, but this is mostly caused by the internal implementation of the EffectComposer and UnrealBloomPass. So not much that can be done without modifying them directly or 'forking' them. Not sure if that's the best approach for an example. In that sense it might be worth looking into pmndrs/postprocessing as they do combine passes where possible.

One thing that can be done is optimizing the car asset using gltf-transform, as it does cause more draw calls than needed. But that isn't directly related to the bloom effect, of course.

@enzofrancescaHM
Copy link
Author

I think your new iteration looks very good! Yes, it is very veery similar to the PR's one.

@dmarcos
Copy link
Member

dmarcos commented Feb 5, 2025

How much of the rendering budget is used in Quest3? How much is reasonable? Any target we should aim at?

@mrxz
Copy link
Contributor

mrxz commented Feb 6, 2025

How much of the rendering budget is used in Quest3? How much is reasonable? Any target we should aim at?

Depends on too many factors to give a concrete answer on what is reasonable. In general I think users just want to "slap on bloom". In that sense the implementation in this PR is limiting for Quest 3. It pushes the GPU to the highest clock level (4) and GPU utilization is consistently >90%. You'll probably have a hard time adding much more to the scene, making it only really suitable for relatively small experiences. Although users could experiment with targetting 72fps and/or use viewport scaling (https://immersive-web.github.io/webxr/#dom-xrview-requestviewportscale).

With https://thrilling-alkaline-headline.glitch.me/index2.html the Quest 3 remains at GPU level 2 in my testing. Though worth pointing out that due to the way it works the bloom effect isn't a fixed-cost and does actually scale with the overall scene complexity. But at the very least there is clearly headroom.

It also depends on what you want from the bloom effect. The current demo scene is setup with a fairly large bloom effect. If someone only cares about a subtle glow around lights/neon-signage a cheaper blur method is probably good enough and less demanding.

@dmarcos
Copy link
Member

dmarcos commented Feb 6, 2025

How much of the rendering budget is used in Quest3? How much is reasonable? Any target we should aim at?

Depends on too many factors to give a concrete answer on what is reasonable. In general I think users just want to "slap on bloom". In that sense the implementation in this PR is limiting for Quest 3. It pushes the GPU to the highest clock level (4) and GPU utilization is consistently >90%. You'll probably have a hard time adding much more to the scene, making it only really suitable for relatively small experiences. Although users could experiment with targetting 72fps and/or use viewport scaling (https://immersive-web.github.io/webxr/#dom-xrview-requestviewportscale).

With https://thrilling-alkaline-headline.glitch.me/index2.html the Quest 3 remains at GPU level 2 in my testing. Though worth pointing out that due to the way it works the bloom effect isn't a fixed-cost and does actually scale with the overall scene complexity. But at the very least there is clearly headroom.

It also depends on what you want from the bloom effect. The current demo scene is setup with a fairly large bloom effect. If someone only cares about a subtle glow around lights/neon-signage a cheaper blur method is probably good enough and less demanding.

Thanks. We should probably limit to Quest3 and up then. People slapping bloom (without understanding the cost) representing 90% of post processing demand is the reason why I resisted. It’s good this PR is just an example. Some friction to use it but enables those that want to experiment.

@enzofrancescaHM
Copy link
Author

I agree, this PR is for opening the way to Post-Process in VR in A-Frame, implementation is not finished here and it is good to leave room for experimenting and improving. With this PR merged we can also push other repos to contribute, i.e. pmndrs/post-processing. At the moment it does not work in VR but yesterday I've managed to make it work, maybe I can open some discussions over there to have more tools to use.

@cabanier
Copy link
Contributor

cabanier commented Feb 6, 2025

How much of the rendering budget is used in Quest3? How much is reasonable? Any target we should aim at?

Depends on too many factors to give a concrete answer on what is reasonable. In general I think users just want to "slap on bloom". In that sense the implementation in this PR is limiting for Quest 3. It pushes the GPU to the highest clock level (4) and GPU utilization is consistently >90%. You'll probably have a hard time adding much more to the scene, making it only really suitable for relatively small experiences. Although users could experiment with targetting 72fps and/or use viewport scaling (https://immersive-web.github.io/webxr/#dom-xrview-requestviewportscale).

FYI Quest Browser 36.5 added support for dynamic viewport scaling so you could ask to render fewer pixels if you see the framerate dip. Does the current approach allow for dynamic resizing of render targets?

It also depends on what you want from the bloom effect. The current demo scene is setup with a fairly large bloom effect. If someone only cares about a subtle glow around lights/neon-signage a cheaper blur method is probably good enough and less demanding.

Simply activating post-processing will still cause all those flushes.
The solution is very similar to what you (@mrxz ) did for reflections: you need to pre-process the render targets ahead of time and then have 2 passes (1 scene render + 1 postprocess). That should fix a lot of the problems.

@mrxz
Copy link
Contributor

mrxz commented Feb 6, 2025

The solution is very similar to what you (@mrxz ) did for reflections: you need to pre-process the render targets ahead of time and then have 2 passes (1 scene render + 1 postprocess). That should fix a lot of the problems.

I don't see how you could bring a bloom effect down into a single pass. You will need to perform a blur, which is going to require either ping-pong between two targets or consecutive downsampling and upsampling steps. Unless you do the full kernel per fragment, but that is prohibitively expensive.

@cabanier Did you try this demo https://thrilling-alkaline-headline.glitch.me/index2.html?
It's a different approach where the bloom is rendered ahead of time and overlaid when forward rendering the scene. It performs surprisingly well (stable 90fps on Quest 3, ~90fps on Quest 2), despite the many intermediate passes for down/up-sampling.

ovrgpuprofiler trace
Surface 0    | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 0 (Direct)    | 1   3360x1760 bins ( 1   rendered) |  0.00 ms | 1   stages : Render : 0.002ms
Surface 1    | 1680x880  | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 1 (HwBinning) | 4   864x448 bins ( 4   rendered) |  0.81 ms | 14  stages : Binning : 0.152ms Render : 0.588ms StoreColor : 0.012ms Blit : 0.005ms StoreDepthStencil : 0.014ms
Surface 2    | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.13 ms | 2   stages : Render : 0.124ms StoreColor : 0.002ms
Surface 3    | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.04 ms | 2   stages : Render : 0.034ms StoreColor : 0.003ms
Surface 4    | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.015ms StoreColor : 0.002ms
Surface 5    | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.011ms StoreColor : 0.003ms
Surface 6    | 52  x27   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.008ms StoreColor : 0.002ms
Surface 7    | 26  x13   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.008ms StoreColor : 0.003ms
Surface 8    | 52  x27   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.008ms StoreColor : 0.002ms
Surface 9    | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.009ms StoreColor : 0.003ms
Surface 10   | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.012ms StoreColor : 0.003ms
Surface 11   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.025ms StoreColor : 0.003ms
Surface 12   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 13   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.003ms
Surface 14   | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.009ms StoreColor : 0.002ms
Surface 15   | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.012ms StoreColor : 0.002ms
Surface 16   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.026ms StoreColor : 0.002ms
Surface 17   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 18   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.003ms
Surface 19   | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.012ms StoreColor : 0.002ms
Surface 20   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.026ms StoreColor : 0.003ms
Surface 21   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 22   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.003ms
Surface 23   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.026ms StoreColor : 0.002ms
Surface 24   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 25   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.05 ms | 2   stages : Render : 0.05ms StoreColor : 0.002ms
Surface 26   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.078ms StoreColor : 0.003ms
Surface 27   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.002ms
Surface 28   | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 72  288x320 bins ( 38  rendered) |  5.20 ms | 82  stages : Binning : 0.775ms Render : 1.777ms StoreColor : 0.422ms Blit : 0.005ms Preempt : 1.854ms

@cabanier
Copy link
Contributor

cabanier commented Feb 6, 2025

The solution is very similar to what you (@mrxz ) did for reflections: you need to pre-process the render targets ahead of time and then have 2 passes (1 scene render + 1 postprocess). That should fix a lot of the problems.

I don't see how you could bring a bloom effect down into a single pass. You will need to perform a blur, which is going to require either ping-pong between two targets or consecutive downsampling and upsampling steps. Unless you do the full kernel per fragment, but that is prohibitively expensive.

yes, bloom still needs a separate pass.

Looking at your trace, you can see that we bound to the swapchain texture, but then immediately switch:
Surface 0 | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 0 (Direct) | 1 3360x1760 bins ( 1 rendered) | 0.00 ms | 1 stages : Render : 0.002ms
After this, I'm unsure if foveation is still working

The second pass is likely rendering the scene (with no MSAA and only half the resolution?) and since it's so simple, it renders quickly. (My proposal to foveate any texture would likely make little difference in this case)
Surface 1 | 1680x880 | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 1 (HwBinning) | 4 864x448 bins ( 4 rendered) | 0.81 ms | 14 stages : Binning : 0.152ms Render : 0.588ms StoreColor : 0.012ms Blit : 0.005ms StoreDepthStencil : 0.014ms
You're also storing depth for no reason. Maybe you can discard it to save some time?

Do you know what all these other small items are that are rendered each frame? if they don't change, maybe they can be cached.

At the very end, there is an expensive pass which is likely unvoidable. I suspect that foveation would still help.

That being said, this trace is looking a LOT better than the one I capture 2 days ago!

@mrxz
Copy link
Contributor

mrxz commented Feb 6, 2025

@cabanier Thanks for you insights

After this, I'm unsure if foveation is still working

Visually foveation is definitely still active. There are no draw/clear calls issued at this stage so that might explain why. Either way, after avoiding this switch it didn't really impact the trace much:

ovrgpuprofiler trace
Surface 0    | 1680x880  | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 1 (HwBinning) | 4   864x448 bins ( 4   rendered) |  0.83 ms | 14  stages : Binning : 0.137ms Render : 0.629ms StoreColor : 0.012ms Blit : 0.005ms StoreDepthStencil : 0.011ms
Surface 1    | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.14 ms | 2   stages : Render : 0.133ms StoreColor : 0.003ms
Surface 2    | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.04 ms | 2   stages : Render : 0.033ms StoreColor : 0.003ms
Surface 3    | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.015ms StoreColor : 0.003ms
Surface 4    | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.011ms StoreColor : 0.003ms
Surface 5    | 52  x27   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.009ms StoreColor : 0.002ms
Surface 6    | 26  x13   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.008ms StoreColor : 0.003ms
Surface 7    | 52  x27   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   96 x32  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.008ms StoreColor : 0.003ms
Surface 8    | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.009ms StoreColor : 0.002ms
Surface 9    | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.013ms StoreColor : 0.003ms
Surface 10   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.025ms StoreColor : 0.002ms
Surface 11   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.09 ms | 2   stages : Render : 0.08ms StoreColor : 0.003ms
Surface 12   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.003ms
Surface 13   | 105 x55   | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   192x64  bins ( 1   rendered) |  0.01 ms | 2   stages : Render : 0.009ms StoreColor : 0.002ms
Surface 14   | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.012ms StoreColor : 0.002ms
Surface 15   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.026ms StoreColor : 0.002ms
Surface 16   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 17   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.05ms StoreColor : 0.003ms
Surface 18   | 210 x110  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   288x128 bins ( 1   rendered) |  0.02 ms | 2   stages : Render : 0.012ms StoreColor : 0.002ms
Surface 19   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.03 ms | 2   stages : Render : 0.025ms StoreColor : 0.002ms
Surface 20   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.08 ms | 2   stages : Render : 0.079ms StoreColor : 0.003ms
Surface 21   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.60 ms | 3   stages : Render : 0.05ms StoreColor : 0.002ms Preempt : 0.488ms
Surface 22   | 420 x220  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   480x224 bins ( 1   rendered) |  0.04 ms | 2   stages : Render : 0.031ms StoreColor : 0.003ms
Surface 23   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.09 ms | 2   stages : Render : 0.08ms StoreColor : 0.003ms
Surface 24   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.051ms StoreColor : 0.003ms
Surface 25   | 840 x440  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   864x448 bins ( 1   rendered) |  0.09 ms | 2   stages : Render : 0.081ms StoreColor : 0.003ms
Surface 26   | 640 x335  | color 32bit, depth 0 bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 1   672x352 bins ( 1   rendered) |  0.06 ms | 2   stages : Render : 0.051ms StoreColor : 0.003ms
Surface 27   | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 72  288x320 bins ( 44  rendered) |  4.62 ms | 93  stages : Binning : 0.685ms Render : 2.201ms StoreColor : 0.389ms Blit : 0.005ms Preempt : 0.976ms

You're also storing depth for no reason. Maybe you can discard it to save some time?

I was aware, just didn't invalidate it yet as Three.js doesn't expose a convenient way to do so.

Do you know what all these other small items are that are rendered each frame? if they don't change, maybe they can be cached.

These are the blurring passes. That's what I meant with not seeing how you could bring the bloom effect down to one pass. Currently 5 different blur radii are combined, so there's definitely some saving possible at the expense of image quality. But even if we'd only want a subtle short length bloom, you'd at the very least have something like "main render" + "horizontal blur pass" + "vertical blur pass" + "bloom pass".

The only way I know to avoid this would be to do the full blur kernel for each pixel, but that can't possible be more performant. Though if there is some technique I'm missing, I'd love to know. Any reference to bloom implementations for Quest are also be appreciated.

At the very end, there is an expensive pass which is likely unvoidable. I suspect that foveation would still help.

This is indeed unavoidable, though a cheaper bloom based on a singular blur radius can still save almost 1 ms on this pass. Though obviously not sure how meaningful these time measurements are when the GPU level is at 2 (not fixed, just not seeing it jump up with this load).

Foveation is already active at this point as mentioned above. When I disable the foveation the render time does indeed shoot up. Interestingly enough the output using logcat | grep VrApi always reports Fov=0 even when it's clearly on. Is this expected? Can reproduce it with https://threejs.org/examples/?q=teleport#webxr_vr_teleport as well. The lines on the walls in this example make it easy to verify visually that it's active.

That being said, this trace is looking a LOT better than the one I capture 2 days ago!

Your previous capture was essentially just the stock EffectComposer + UnrealBloomPass, whereas this is a different approach entirely. Strictly speaking it isn't even post-processing, but it seems serviceable for the bloom effect. Sadly that also means that these results don't translate to other potential post processing effects.

@cabanier
Copy link
Contributor

cabanier commented Feb 6, 2025

You're also storing depth for no reason. Maybe you can discard it to save some time?

I was aware, just didn't invalidate it yet as Three.js doesn't expose a convenient way to do so.

It's one of the things that will be fixed in the three.js redesign. As long as you clear the swapchain backed texture, you should still get foveation.
Is there a reason to render the scene without MSAA?

Do you know what all these other small items are that are rendered each frame? if they don't change, maybe they can be cached.

These are the blurring passes. That's what I meant with not seeing how you could bring the bloom effect down to one pass. Currently 5 different blur radii are combined, so there's definitely some saving possible at the expense of image quality. But even if we'd only want a subtle short length bloom, you'd at the very least have something like "main render" + "horizontal blur pass" + "vertical blur pass" + "bloom pass".

OK, since those passes are very fast, it doesn't look like a problem.

Foveation is already active at this point as mentioned above. When I disable the foveation the render time does indeed shoot up. Interestingly enough the output using logcat | grep VrApi always reports Fov=0 even when it's clearly on. Is this expected? Can reproduce it with https://threejs.org/examples/?q=teleport#webxr_vr_teleport as well. The lines on the walls in this example make it easy to verify visually that it's active.

Yes, that is expected. Because of the out-of-process nature of the Chromium renderer, we can't use the system API calls that set up foveation so it's not recorded in the tool.

That being said, this trace is looking a LOT better than the one I capture 2 days ago!

Your previous capture was essentially just the stock EffectComposer + UnrealBloomPass, whereas this is a different approach entirely. Strictly speaking it isn't even post-processing, but it seems serviceable for the bloom effect. Sadly that also means that these results don't translate to other potential post processing effects.

Great! The rendering of the scene itself would still benefit from foveation but we'll have to wait for that API to land.

@mrxz
Copy link
Contributor

mrxz commented Feb 7, 2025

Is there a reason to render the scene without MSAA?

The main forward pass is the last one, which is using MSAA x4 as can be seen from the trace. No MSAA is used for the bloom input as it will get blurred anyway (blur is a great AA technique 😉) and resolving an HDR buffer still leaves aliasing artifacts around bright areas. So I don't see a reason to use MSAA there.

Great! The rendering of the scene itself would still benefit from foveation but we'll have to wait for that API to land.

Indeed, looking forward to trying that out. Also WEBGL_shader_pixel_local_storage could be interesting here. Currently blending happens in sRGB, which is all sorts of wrong, it just happens to look okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants