Perf/various optimizations #1128

cmeissl · 2023-09-10T18:57:03Z

WIP: Some experiments for getting the render time further down.

Based on #1122

Quick compare with 2 outputs, one empty the other with glmark:

Drakulix

I wish this was split into more commits, because I feel like we are getting into muddy waters here.

A couple of things are obviously a good idea.
That includes making the damage snapshot copy-on-write (already neatly separated) or adding has_dmabuf_format (not so separated).

Others need more testing (preferrably in different compositors with different workloads) to convince me. E.g. cosmic-comp probably has a lot more small RenderElements, than anvil for various UI elements.

Using Vecs in places of Maps. I can see how some of these only have a very small amount of elements, which might make this more efficient.
Using Fxhash instead of std' HashMaps. I haven't found any good characteristics on when fxhash might be faster and why this applies here.

I also would like to point out, that this depends on the rust-version used, as stdlib collections are still frequently optimized. So there might be reason to use these optimizations, if you are stuck with older versions, and less reason with newer ones. Profiling runs should thus also be tagged with the rust-version used to build.

cmeissl · 2023-09-11T13:44:23Z

Sure, as always I will split it into mutliple commits. I want to be able to profile each change independently.
I expect that after splitting we can merge some obvious optimizations directly, while others may lead to nothing and may be dropped.

Candidates for merge are:

DamageSnasphot COW
has_dmabuf_format
rectangle subtract many (after more testing)
render_output_with (this can save quite some time, bind/unbind is rather expensive)

Possible micro-optimizations:

replacing a few hash maps with vec (only where we can assume a small set, which is true for the planes)

Lower priority:

The FxHash stuff. This definitely needs a bigger test-set. But we use HashMaps/Sets in quite a few places, so it is something I definitely want to explore.

codecov-commenter · 2023-09-11T21:05:39Z

Codecov Report

Patch coverage: 55.55% and project coverage change: +0.09% 🎉

Comparison is base (691bb28) 22.88% compared to head (cebab8d) 22.98%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1128      +/-   ##
==========================================
+ Coverage   22.88%   22.98%   +0.09%     
==========================================
  Files         143      143              
  Lines       23050    23002      -48     
==========================================
+ Hits         5275     5286      +11     
+ Misses      17775    17716      -59

Flag	Coverage Δ
wlcs-buffer	`20.10% <55.12%> (+0.10%)`	⬆️
wlcs-core	`19.75% <54.27%> (+0.09%)`	⬆️
wlcs-output	`8.32% <29.48%> (+0.22%)`	⬆️
wlcs-pointer-input	`21.75% <53.84%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
src/backend/renderer/utils/wayland.rs	`56.84% <0.00%> (+3.74%)`	⬆️
src/backend/renderer/mod.rs	`17.61% <20.00%> (+0.16%)`	⬆️
src/utils/geometry.rs	`52.13% <21.68%> (+0.48%)`	⬆️
src/desktop/space/wayland/mod.rs	`67.74% <50.00%> (ø)`
src/desktop/space/wayland/window.rs	`43.18% <66.66%> (-3.81%)`	⬇️
src/backend/renderer/damage.rs	`66.11% <80.80%> (+11.89%)`	⬆️
src/backend/renderer/utils/mod.rs	`57.53% <83.33%> (+0.29%)`	⬆️
anvil/src/shell/element.rs	`29.92% <100.00%> (-0.79%)`	⬇️
src/desktop/space/mod.rs	`48.14% <100.00%> (-2.38%)`	⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cmeissl · 2023-09-14T18:59:33Z

Okay, so everything should be nicely split up in smallish commits now.
I also removed the alternative hashing for now.

Drakulix

Every single optimization seems very sensible to me and should be an obvious improvement.

Lets wait for a few test runs, but I am pretty certain we can merge this as is.

YaLTeR · 2023-09-15T18:12:16Z

Yellow = this PR, red = master.

Not an exact benchmark here; I opened a few alacritties and weston-presentation-shm, then switched to Firefox and scrolled that around a bit.

this implements clone on write for the DamageBag/ DamageSnapshot which makes getting a snapshot of the bag a cheap clone.

multi-gpu has to test if a specific format is supported using `dmabuf_formats` for this is rather slow introduce a fast-path for checking if a format is supported

this introduces a new function `render_output_with` which can lazily bind the provided buffer. in case nothing will get rendered binding is completely skipped.

looking up the path is quite expensive and might show wrong results in a trace

in most cases we will only receive a single instance per element. we can reduce allocations by using smallvec for storing the instances

...where the element count can be predicted

cmeissl · 2023-09-19T06:56:41Z

Okay, also did another trace:

So imo this is good to get merged

Drakulix reviewed Sep 11, 2023

View reviewed changes

cmeissl force-pushed the perf/various_optimizations branch 3 times, most recently from fe91013 to d8f418a Compare September 11, 2023 20:42

Drakulix approved these changes Sep 15, 2023

View reviewed changes

cmeissl added 14 commits September 17, 2023 12:40

renderer: make the damage snapshot cow

db05664

this implements clone on write for the DamageBag/ DamageSnapshot which makes getting a snapshot of the bag a cheap clone.

drm: early release empty frames

7e784d5

renderer: fast check if format is supported

b5f64e3

multi-gpu has to test if a specific format is supported using `dmabuf_formats` for this is rather slow introduce a fast-path for checking if a format is supported

desktop: reduce severity of output update log messages

e2e353a

utils: subtract rects (many)

180332b

renderer: allow to lazy bind buffer

af7f9b6

this introduces a new function `render_output_with` which can lazily bind the provided buffer. in case nothing will get rendered binding is completely skipped.

drm: skip direct scan-out on cursor plane

b9f0154

anvil: don't include drm path

41d4372

looking up the path is quite expensive and might show wrong results in a trace

desktop: include refresh in profiling

8f0ce4c

anvil: exclude mem profiling by default

e9bf55e

drm: get rid of cloning the fb cache

18013d8

renderer/drm: reduce instance allocation

47ff97f

in most cases we will only receive a single instance per element. we can reduce allocations by using smallvec for storing the instances

drm: pre-allocate internal maps...

ce8611d

...where the element count can be predicted

drm: reduce duplicated geometry queries

cebab8d

cmeissl force-pushed the perf/various_optimizations branch from e8c8c79 to cebab8d Compare September 17, 2023 10:40

cmeissl marked this pull request as ready for review September 19, 2023 06:56

Drakulix merged commit 9856e93 into Smithay:master Sep 19, 2023
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/various optimizations #1128

Perf/various optimizations #1128

cmeissl commented Sep 10, 2023 •

edited

Loading

Drakulix left a comment

cmeissl commented Sep 11, 2023

codecov-commenter commented Sep 11, 2023 •

edited

Loading

cmeissl commented Sep 14, 2023

Drakulix left a comment

YaLTeR commented Sep 15, 2023

cmeissl commented Sep 19, 2023

Perf/various optimizations #1128

Perf/various optimizations #1128

Conversation

cmeissl commented Sep 10, 2023 • edited Loading

Drakulix left a comment

Choose a reason for hiding this comment

cmeissl commented Sep 11, 2023

codecov-commenter commented Sep 11, 2023 • edited Loading

Codecov Report

cmeissl commented Sep 14, 2023

Drakulix left a comment

Choose a reason for hiding this comment

YaLTeR commented Sep 15, 2023

cmeissl commented Sep 19, 2023

cmeissl commented Sep 10, 2023 •

edited

Loading

codecov-commenter commented Sep 11, 2023 •

edited

Loading