Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/various optimizations #1128

Merged
merged 14 commits into from
Sep 19, 2023
Merged

Conversation

cmeissl
Copy link
Collaborator

@cmeissl cmeissl commented Sep 10, 2023

WIP: Some experiments for getting the render time further down.

Based on #1122

Quick compare with 2 outputs, one empty the other with glmark:
Screenshot from 2023-09-14 20-58-09

Copy link
Member

@Drakulix Drakulix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this was split into more commits, because I feel like we are getting into muddy waters here.

A couple of things are obviously a good idea.
That includes making the damage snapshot copy-on-write (already neatly separated) or adding has_dmabuf_format (not so separated).

Others need more testing (preferrably in different compositors with different workloads) to convince me. E.g. cosmic-comp probably has a lot more small RenderElements, than anvil for various UI elements.

  • Using Vecs in places of Maps. I can see how some of these only have a very small amount of elements, which might make this more efficient.
  • Using Fxhash instead of std' HashMaps. I haven't found any good characteristics on when fxhash might be faster and why this applies here.

I also would like to point out, that this depends on the rust-version used, as stdlib collections are still frequently optimized. So there might be reason to use these optimizations, if you are stuck with older versions, and less reason with newer ones. Profiling runs should thus also be tagged with the rust-version used to build.

@cmeissl
Copy link
Collaborator Author

cmeissl commented Sep 11, 2023

Sure, as always I will split it into mutliple commits. I want to be able to profile each change independently.
I expect that after splitting we can merge some obvious optimizations directly, while others may lead to nothing and may be dropped.

Candidates for merge are:

  • DamageSnasphot COW
  • has_dmabuf_format
  • rectangle subtract many (after more testing)
  • render_output_with (this can save quite some time, bind/unbind is rather expensive)

Possible micro-optimizations:

  • replacing a few hash maps with vec (only where we can assume a small set, which is true for the planes)

Lower priority:

  • The FxHash stuff. This definitely needs a bigger test-set. But we use HashMaps/Sets in quite a few places, so it is something I definitely want to explore.

@cmeissl cmeissl force-pushed the perf/various_optimizations branch 3 times, most recently from fe91013 to d8f418a Compare September 11, 2023 20:42
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2023

Codecov Report

Patch coverage: 55.55% and project coverage change: +0.09% 🎉

Comparison is base (691bb28) 22.88% compared to head (cebab8d) 22.98%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1128      +/-   ##
==========================================
+ Coverage   22.88%   22.98%   +0.09%     
==========================================
  Files         143      143              
  Lines       23050    23002      -48     
==========================================
+ Hits         5275     5286      +11     
+ Misses      17775    17716      -59     
Flag Coverage Δ
wlcs-buffer 20.10% <55.12%> (+0.10%) ⬆️
wlcs-core 19.75% <54.27%> (+0.09%) ⬆️
wlcs-output 8.32% <29.48%> (+0.22%) ⬆️
wlcs-pointer-input 21.75% <53.84%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/backend/renderer/utils/wayland.rs 56.84% <0.00%> (+3.74%) ⬆️
src/backend/renderer/mod.rs 17.61% <20.00%> (+0.16%) ⬆️
src/utils/geometry.rs 52.13% <21.68%> (+0.48%) ⬆️
src/desktop/space/wayland/mod.rs 67.74% <50.00%> (ø)
src/desktop/space/wayland/window.rs 43.18% <66.66%> (-3.81%) ⬇️
src/backend/renderer/damage.rs 66.11% <80.80%> (+11.89%) ⬆️
src/backend/renderer/utils/mod.rs 57.53% <83.33%> (+0.29%) ⬆️
anvil/src/shell/element.rs 29.92% <100.00%> (-0.79%) ⬇️
src/desktop/space/mod.rs 48.14% <100.00%> (-2.38%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cmeissl
Copy link
Collaborator Author

cmeissl commented Sep 14, 2023

Okay, so everything should be nicely split up in smallish commits now.
I also removed the alternative hashing for now.

Copy link
Member

@Drakulix Drakulix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every single optimization seems very sensible to me and should be an obvious improvement.

Lets wait for a few test runs, but I am pretty certain we can merge this as is.

@YaLTeR
Copy link
Contributor

YaLTeR commented Sep 15, 2023

Screenshot from 2023-09-15 22-10-51

Yellow = this PR, red = master.

Not an exact benchmark here; I opened a few alacritties and weston-presentation-shm, then switched to Firefox and scrolled that around a bit.

this implements clone on write for the DamageBag/
DamageSnapshot which makes getting a snapshot
of the bag a cheap clone.
multi-gpu has to test if a specific format is supported
using `dmabuf_formats` for this is rather slow
introduce a fast-path for checking if a format
is supported
this introduces a new function `render_output_with`
which can lazily bind the provided buffer.
in case nothing will get rendered binding is completely
skipped.
looking up the path is quite expensive
and might show wrong results in a trace
in most cases we will only receive a single
instance per element. we can reduce
allocations by using smallvec for storing the instances
...where the element count can be predicted
@cmeissl cmeissl force-pushed the perf/various_optimizations branch from e8c8c79 to cebab8d Compare September 17, 2023 10:40
@cmeissl
Copy link
Collaborator Author

cmeissl commented Sep 19, 2023

Okay, also did another trace:

image

So imo this is good to get merged

@cmeissl cmeissl marked this pull request as ready for review September 19, 2023 06:56
@Drakulix Drakulix merged commit 9856e93 into Smithay:master Sep 19, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants