-
Notifications
You must be signed in to change notification settings - Fork 15
Faster alpha blending #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…th better tolerances than libwebp blending!
…tests with better tolerances than libwebp blending!" This reverts commit 68ea852.
Ah, test fail in big endian. Well, that's gonna be fun to debug. Thanks, C. |
I've fixed the endianness bug (good thing we have big-endian CI!), so this is now ready for review. |
Senpai has noticed my PR! ❤️ |
I just realized that this PR updated some of the reference images without editing CREDITS.md which documents how they were generated. Specifically, they were previously produced by |
Oh, yes, that is correct. I did update the animated image sample because it differed very slightly. |
Switch to a direct translation of libwebp alpha blending, plus a tweak suggested by @awxkee in #119 to improve the precision of this approximation.
It's not an exact match of the f64 method, but it is more accurate than libwebp itself, and without losing performance compared to the libwebp method (at least on my machine). The inaccuracies in both approximations show up only on very low alpha values, which makes them imperceptible.
This PR includes a test verifying this approximation against the floating-point reference.
libwebp is BSD licensed, which is not a viral license. This is also not the first instance of adapting code from libwebp based on the comments in the repo.
There's a case to be made for refactoring away the C-isms and also experimenting with
u16
instead ofu32
for intermediate values so that the outer loop could blend two pixels at once, but that would require much more work, and this is already a dramatic performance improvement on the sample from #119. Combined with #122 this reduces the end-to-end decoding time by 27%.