Skip to content

Conversation

@lgritz
Copy link
Collaborator

@lgritz lgritz commented Nov 22, 2025

This test started failing sporadically recently, with output images that looked plausible, but differed slightly from the rest of the checked-in reference images for this test. It only failed maybe 5% of the time, so sometimes a CI run would pass, or sometimes fail just one job variation, or very occasional two. Rerunning the failed job would generally succeed, but sometimes fail again.

I'll cut to the chase: We were able to trace it to all failures happening on GitHub runners that identified themselves as "Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz". When subsequent runs of the same job variation succeded, it was always because it landed on a runner that identified itself as "AMD EPYC 7763 64-Core Processor".

We presume that some minor HW or SW differences (perhaps in a library with an ISA-specific code path) resulted in LSB errors that conspire to sample different light paths on a minority of pixels, leading to a different pattern of specular "fireflies".

The expedient fix herein is simply to commit yet another reference image that is considered to be a "pass".

Along for the ride, discovered during my debugging:

  • Rename the label of one job variant, which was incorrect.
  • Add comments to the test thresholding parameters (comments ported from the analogous script in OIIO, making it easier to understand next time one of us is figuring out this code).

This test started failing sporadically recently, with output images
that looked plausible, but differed slightly from the rest of the
checked-in reference images for this test. It only failed maybe 5% of
the time, so sometimes a CI run would pass, or sometimes fail just one
job variation, or very occasional two. Rerunning the failed job would
generally succeed, but sometimes fail again.

I'll cut to the chase: We were able to trace it to all failures
happening on GitHub runners that identified themselves as "Intel(R)
Xeon(R) Platinum 8370C CPU @ 2.80GHz". When subsequent runs of the
same job variation succeded, it was always because it landed on a
runner that identified itself as "AMD EPYC 7763 64-Core Processor".

We presume that some minor HW or SW differences (perhaps in a library
with an ISA-specific code path) resulted in LSB errors that conspire
to sample different light paths on a minority of pixels, leading to a
different pattern of specular "fireflies".

The expedient fix herein is simply to commit yet another reference
image that is considered to be a "pass".

Along for the ride, discovered during my debugging:
* Rename the label of one job variant, which was incorrect.
* Add comments to the test thresholding parameters (comments ported
  from the analogous script in OIIO, making it easier to understand
  next time one of us is figuring out this code).

Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz requested a review from fpsunflower November 22, 2025 00:34
@lgritz lgritz added the build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. label Nov 22, 2025
@lgritz lgritz merged commit 190c3f0 into AcademySoftwareFoundation:main Nov 24, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants