Optimization: color_blend() variable range now determined by overloading #4245

DedeHai · 2024-11-02T10:41:49Z

Removing the bool saves on code size and makes the function a tiny bit faster. Also this is a cleaner solution IMHO.

Tested the affected FX, did not notice any change.

…ding Removing the bool saves on code size and makes the function a tiny bit faster. Also this is a cleaner solution IMHO. Tested the affected FX, did not notice any change.

changes got lost along the way to PR...

wled00/fcn_declare.h

- previous version did not return C2 when blend=255 - inlining 0 and max check may be faster in some cases and uses just 40 extra bytes. - 8bit and 16bit versions both call the 16bit base function

softhack007 · 2024-11-15T14:43:49Z

@DedeHai I've played a bit with the color_blend function, and I found some improvements

converting 8bit blend to 16bit:

(uint16_t)b << 8 : with the shift only, the max "blend16" we can achieve is 0xFF00 so if (blend == blendmax) return color2 is never taken.

To use the full range, we could use (uint16_t)b << 8) | b.

improved blend accuracy

The blend logic seems to be based on some very old FastLED code. FastLED has changed their blend8() function about 3 years ago, to avoid discontinuities and rounding errors. The new math is explained here and it seems to be numerically more stable:

https://github.com/FastLED/FastLED/blob/9176e0ff9a67365c7bd4393da52be76b6ce4a5b3/src/lib8tion/math8.h#L565

For the 8bit case, we can use the same logic - i've tested it and indeed its more accurate than the code in 0_15.

   uint8_t r3 = (((r1 << 8) | r2) + (r2 * blend) - (r1*blend) ) >> 8;

For 16bit blend, it might be trickier, because | r2 (solves rounding problems) should be either | r2<<8 or | r2<<8 | r2

DedeHai · 2024-11-15T17:19:28Z

with the shift only, the max "blend16" we can achieve is 0xFF00

this was fixed in the latest commit by adding a check in the inline function.

I need to take a closer look at the optimization, maybe this new version can be done the way I did with color_add() in #4138 (calculating two colors in one go)

- efficient color blend calculation in fews operations possible - omitting min / max checks makes it faster on average - using 8bit for "blend" variable does not significantly influence the resulting color, just transition points are slightly shifted but yield very good results (and better than the original 16bit version using the old fastled math with improper rounding) - updated drawCircle and drawLine to use 8bit directly instead of 16bit with a shift

DedeHai · 2024-11-21T17:54:33Z

I made some further optimizations based on the new fastLED version you proposed. In my tests, your suggested variant was a tiny bit faster when testing it on the ripple FX, my updated version uses less code so I am guessing the compile did some inlining to speed up drawCircle() in some way.
The new variant with ditching the 16bit blend and going to 8bit gives almost identical results, just the rounding is slightly different. When plotting the output of the functions, the 8bit variant is actually not any worse, at least in the cases I have taken a closer look at. When checking the ripple FX, which is the only one using the blend function apart from transitions, I could not see any difference but the new code is slightly faster.
@softhack007 Can you please check if this works as expected in your test cases?

…ation

Optimized color_blend: blend variable range now determined by ovleroa…

95e0e4a

…ding Removing the bool saves on code size and makes the function a tiny bit faster. Also this is a cleaner solution IMHO. Tested the affected FX, did not notice any change.

DedeHai added the optimization re-working an existing feature to be faster, or use less memory label Nov 2, 2024

bugfix

ff30ff7

changes got lost along the way to PR...

DedeHai added this to the 0.15.1 candidate milestone Nov 2, 2024

DedeHai mentioned this pull request Nov 2, 2024

Added integer based sin()/cos() functions, changed all trig functions to wled_math #4181

Open

DedeHai changed the title ~~Optimization: color_blend() variable range now determined by ovleroding~~ Optimization: color_blend() variable range now determined by overloading Nov 2, 2024

softhack007 reviewed Nov 9, 2024

View reviewed changes

wled00/fcn_declare.h Outdated Show resolved Hide resolved

distinguish between 16 and 8bit, also fixed max value for 8bit

633d0c6

- previous version did not return C2 when blend=255 - inlining 0 and max check may be faster in some cases and uses just 40 extra bytes. - 8bit and 16bit versions both call the 16bit base function

Merge remote-tracking branch 'upstream/0_15' into color_blend_optimiz…

876697b

…ation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: color_blend() variable range now determined by overloading #4245

Optimization: color_blend() variable range now determined by overloading #4245

DedeHai commented Nov 2, 2024

softhack007 commented Nov 15, 2024 •

edited

Loading

DedeHai commented Nov 15, 2024

DedeHai commented Nov 21, 2024 •

edited

Loading

Optimization: color_blend() variable range now determined by overloading #4245

Are you sure you want to change the base?

Optimization: color_blend() variable range now determined by overloading #4245

Conversation

DedeHai commented Nov 2, 2024

softhack007 commented Nov 15, 2024 • edited Loading

converting 8bit blend to 16bit:

improved blend accuracy

DedeHai commented Nov 15, 2024

DedeHai commented Nov 21, 2024 • edited Loading

softhack007 commented Nov 15, 2024 •

edited

Loading

DedeHai commented Nov 21, 2024 •

edited

Loading