-
Notifications
You must be signed in to change notification settings - Fork 62
Description
re: our recent short discussion on Discord https://discord.com/channels/143867839282020352/960223751599976479/1355633184347390195
The lookup tables for the conversion transform are using 256-entry lookup tables for inverse gamma and 1024 entries for gamma, which is then accompanied by linear interpolation. This is fine when working with 8-bit input and output pixel formats, but I'm now using FP32 for all intermediate processing. This involves conversion from UI8 Companded to FP32 Companded (c * 255.0f
) with my own code, followed by lots of compositing and blending code, and then gamma conversion to linear using PhotoSauce. So it can already benefit from higher precision; I'm not just going straight from UI8 Companded to FP32 Linear.
The reason for doing blending in companded space is that this is how things have always worked in Paint.NET, and changing it would go against current user expectations and would change the way multi-layered images appear (a big no-no). Doing blending in linear space will come later but must be optional/configurable via some kind of opt-in or opt-out mechanism. My rendering code can already do this very easily but the UI/UX and .PDN file format are not yet ready, in other words.
I'm also slowly moving towards high bit-depth (HBD) formats (e.g. RGBA UI16) and HDR formats (RGBA FP16), where increased precision in gamma conversion will definitely be helpful. I'm already using RGBA FP16 scRGB (linear) when copying pixel data over to the GPU for presentation purposes via Direct2D and Advanced Color (which enables breaking out of the sRGB box you're otherwise limited to).
I did a few quick tests where I changed LookupTables.GammaScale
from 1023
to 4095
, and .InverseGammaScale
from byte.MaxValue
to 4095
and everything seemed to work fine. This gives 12-bits of precision which consumes ~16KB per lookup table. There is probably a minor, maybe even measurable, performance difference here but it should still easily fit in L2 cache (e.g. 256KB on any modern CPU, Skylake+). I don't expect a big performance hit, in other words. Expanding the LUTs to 16-bits would be even better but would likely have minimal practical benefit and not be worth the additional performance cost.
Is there a benchmark I can run to test the performance of this experiment? Do the unit tests care about this? (or should I add some?) Are there any other sharp edges or concerns you can think of? I would love to land this for my v5.2 release that I'm currently working on and hope to release later this year, maybe around the same time .NET 10 lands.