Skip to content

Commit 66fca06

Browse files
authored
[OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (#133435)
This patch removes the addition of 1 to the number of iterations when calling the following DeviceRTL functions: - `__kmpc_distribute_for_static_loop*` - `__kmpc_distribute_static_loop*` - `__kmpc_for_static_loop*` Calls to these functions are currently only produced by the OMPIRBuilder from flang, which already passes the correct number of iterations to these functions. By adding 1 to the received `num_iters` variable, worksharing can produce incorrect results. This impacts flang OpenMP offloading of `do`, `distribute` and `distribute parallel do` constructs. Expecting the application to pass `tripcount - 1` as the argument seems unexpected as well, so rather than updating flang I think it makes more sense to update the runtime.
1 parent e17d864 commit 66fca06

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

offload/DeviceRTL/src/Workshare.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -911,19 +911,19 @@ template <typename Ty> class StaticLoopChunker {
911911
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
912912
TY num_threads, TY block_chunk, TY thread_chunk) { \
913913
ompx::StaticLoopChunker<TY>::DistributeFor( \
914-
loc, fn, arg, num_iters + 1, num_threads, block_chunk, thread_chunk); \
914+
loc, fn, arg, num_iters, num_threads, block_chunk, thread_chunk); \
915915
} \
916916
[[gnu::flatten, clang::always_inline]] void \
917917
__kmpc_distribute_static_loop##BW(IdentTy *loc, void (*fn)(TY, void *), \
918918
void *arg, TY num_iters, \
919919
TY block_chunk) { \
920-
ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters + 1, \
920+
ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters, \
921921
block_chunk); \
922922
} \
923923
[[gnu::flatten, clang::always_inline]] void __kmpc_for_static_loop##BW( \
924924
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
925925
TY num_threads, TY thread_chunk) { \
926-
ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters + 1, num_threads, \
926+
ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters, num_threads, \
927927
thread_chunk); \
928928
}
929929

0 commit comments

Comments
 (0)