OpenCL race condition with TailStrategy::ShiftInwards? #5430

Bastacyclop · 2020-11-04T08:07:13Z

Bastacyclop
Nov 4, 2020

When generating tiled OpenCL code, Halide generates code which I believe has a race condition (multiple concurrent threads may write to the same output element). I do not see how this is legal according to the OpenCL specification and believe atomic stores should be used for correctness, or am I mistaken? The program does seem to consistently produce the correct output on my machine.

Small example:

output(x) = input(x) + 2 * input(x+1) + input(x+2);
output.gpu_tile(x, xi, 32); // defaults to TailStrategy::ShiftInwards

Intermediate statements extract:

  let t15 = (output.extent.0 + 31)/32
  let t16 = output.extent.0/32
  let t18 = (output.extent.0 + output.min.0) - input.min.0
  let t17 = output.min.0 - input.min.0
  gpu_block<OpenCL> (output.s0.x.x.__block_id_x, 0, t15) {
   gpu_thread<OpenCL> (.__thread_id_x, 0, 32) {
    if (output.s0.x.x.__block_id_x < t16) {
     let t13 = ((output.s0.x.x.__block_id_x*32) + t17) + .__thread_id_x
     output[(output.s0.x.x.__block_id_x*32) + .__thread_id_x] = input[t13 + 2] + (input[t13] + (input[t13 + 1]*2.000000f))
    } else {
     let t14 = .__thread_id_x + t18
     output[(.__thread_id_x + output.extent.0) + -32] = input[t14 + -30] + (input[t14 + -32] + (input[t14 + -31]*2.000000f))
    }
   }
  }

OpenCL kernel extract:

__kernel void kernel_output_s0_x_x___block_id_x(
 __address_space__input const float *restrict _input,
 __address_space__output float *restrict _output,
 // [...]
 __local int16* __shared)
{
 int _output_s0_x_x___block_id_x = get_group_id(0);
 int ___thread_id_x = get_local_id(0);
 bool _0 = _output_s0_x_x___block_id_x < _t16;
 if (_0)
 {
  int _1 = _output_s0_x_x___block_id_x * 32;
  int _13 = _1 + ___thread_id_x;
  // [...]
  _output[_13] = _12;
 } // if _0
 else
 {
  // [...]
  int _25 = ___thread_id_x + _output_extent_0;
  int _26 = _25 + -32;
  _output[_26] = _24;
 } // if _0 else
} // kernel kernel_output_s0_x_x___block_id_x

Answered by abadams

Nov 4, 2020

Yes, Halide can generate race conditions of this specific type: Two threads race to store the same value to the same memory location, and then there's a full memory barrier before any thread tries to read that location.

It's the most benign possible race condition I can think of, but it's still technically UB in many contexts. If it's a problem I would say just use GuardWithIf instead. We have seen it cause non-determinism in the past when the tail case gets compiled as a separate piece of code, floating point optimizations shake out differently, and then the race is between different values, both presumably "correct" ones according to -ffast-math wild west rules.

View full answer

abadams · 2020-11-04T16:51:53Z

abadams
Nov 4, 2020
Maintainer

Yes, Halide can generate race conditions of this specific type: Two threads race to store the same value to the same memory location, and then there's a full memory barrier before any thread tries to read that location.

It's the most benign possible race condition I can think of, but it's still technically UB in many contexts. If it's a problem I would say just use GuardWithIf instead. We have seen it cause non-determinism in the past when the tail case gets compiled as a separate piece of code, floating point optimizations shake out differently, and then the race is between different values, both presumably "correct" ones according to -ffast-math wild west rules.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCL race condition with TailStrategy::ShiftInwards? #5430

{{title}}

Replies: 1 comment

{{title}}

Select a reply

OpenCL race condition with TailStrategy::ShiftInwards? #5430

Bastacyclop Nov 4, 2020

Replies: 1 comment

abadams Nov 4, 2020 Maintainer

Bastacyclop
Nov 4, 2020

abadams
Nov 4, 2020
Maintainer