Skip to content

Increase precision of gamma lookup tables from 8- or 10-bits, possibly to 12-bits #183

@rickbrew

Description

@rickbrew

re: our recent short discussion on Discord https://discord.com/channels/143867839282020352/960223751599976479/1355633184347390195

The lookup tables for the conversion transform are using 256-entry lookup tables for inverse gamma and 1024 entries for gamma, which is then accompanied by linear interpolation. This is fine when working with 8-bit input and output pixel formats, but I'm now using FP32 for all intermediate processing. This involves conversion from UI8 Companded to FP32 Companded (c * 255.0f) with my own code, followed by lots of compositing and blending code, and then gamma conversion to linear using PhotoSauce. So it can already benefit from higher precision; I'm not just going straight from UI8 Companded to FP32 Linear.

The reason for doing blending in companded space is that this is how things have always worked in Paint.NET, and changing it would go against current user expectations and would change the way multi-layered images appear (a big no-no). Doing blending in linear space will come later but must be optional/configurable via some kind of opt-in or opt-out mechanism. My rendering code can already do this very easily but the UI/UX and .PDN file format are not yet ready, in other words.

I'm also slowly moving towards high bit-depth (HBD) formats (e.g. RGBA UI16) and HDR formats (RGBA FP16), where increased precision in gamma conversion will definitely be helpful. I'm already using RGBA FP16 scRGB (linear) when copying pixel data over to the GPU for presentation purposes via Direct2D and Advanced Color (which enables breaking out of the sRGB box you're otherwise limited to).

I did a few quick tests where I changed LookupTables.GammaScale from 1023 to 4095, and .InverseGammaScale from byte.MaxValue to 4095 and everything seemed to work fine. This gives 12-bits of precision which consumes ~16KB per lookup table. There is probably a minor, maybe even measurable, performance difference here but it should still easily fit in L2 cache (e.g. 256KB on any modern CPU, Skylake+). I don't expect a big performance hit, in other words. Expanding the LUTs to 16-bits would be even better but would likely have minimal practical benefit and not be worth the additional performance cost.

Is there a benchmark I can run to test the performance of this experiment? Do the unit tests care about this? (or should I add some?) Are there any other sharp edges or concerns you can think of? I would love to land this for my v5.2 release that I'm currently working on and hope to release later this year, maybe around the same time .NET 10 lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions