-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf/various optimizations #1128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this was split into more commits, because I feel like we are getting into muddy waters here.
A couple of things are obviously a good idea.
That includes making the damage snapshot copy-on-write (already neatly separated) or adding has_dmabuf_format
(not so separated).
Others need more testing (preferrably in different compositors with different workloads) to convince me. E.g. cosmic-comp probably has a lot more small RenderElements, than anvil for various UI elements.
- Using
Vec
s in places ofMap
s. I can see how some of these only have a very small amount of elements, which might make this more efficient. - Using Fxhash instead of std' HashMaps. I haven't found any good characteristics on when fxhash might be faster and why this applies here.
I also would like to point out, that this depends on the rust-version used, as stdlib collections are still frequently optimized. So there might be reason to use these optimizations, if you are stuck with older versions, and less reason with newer ones. Profiling runs should thus also be tagged with the rust-version used to build.
Sure, as always I will split it into mutliple commits. I want to be able to profile each change independently. Candidates for merge are:
Possible micro-optimizations:
Lower priority:
|
fe91013
to
d8f418a
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1128 +/- ##
==========================================
+ Coverage 22.88% 22.98% +0.09%
==========================================
Files 143 143
Lines 23050 23002 -48
==========================================
+ Hits 5275 5286 +11
+ Misses 17775 17716 -59
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Okay, so everything should be nicely split up in smallish commits now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every single optimization seems very sensible to me and should be an obvious improvement.
Lets wait for a few test runs, but I am pretty certain we can merge this as is.
this implements clone on write for the DamageBag/ DamageSnapshot which makes getting a snapshot of the bag a cheap clone.
multi-gpu has to test if a specific format is supported using `dmabuf_formats` for this is rather slow introduce a fast-path for checking if a format is supported
this introduces a new function `render_output_with` which can lazily bind the provided buffer. in case nothing will get rendered binding is completely skipped.
looking up the path is quite expensive and might show wrong results in a trace
in most cases we will only receive a single instance per element. we can reduce allocations by using smallvec for storing the instances
...where the element count can be predicted
e8c8c79
to
cebab8d
Compare
WIP: Some experiments for getting the render time further down.
Based on #1122
Quick compare with 2 outputs, one empty the other with glmark: