-
Couldn't load subscription status.
- Fork 196
Description
Motivation
As of today, working on distributed parallelism in Fortran mostly implies using MPI or coarray. But as of now, one has to decide early on which one to relay on. I would like to propose for stdlib to wrap certain basic reduction operators which can rely on either of them through C-preprocessing such that other procedures from stdlib could profit from such a wrapper.
I'll try to give a picture with a very simple example, let's say computing the norm2 of a 1D array. This operations requires a parallel sum reduction before computing the square root:
<kind> :: x(:) !> in spmd, each process/image has a partial portion of the array
<kind> :: local_sum , global_sum
...
local_sum = dot_product( x, x ) !> this sum is incomplete with respect to the distributed data
#if defined(STDLIB_WITH_MPI)
call MPI_Allreduce(local_sum, global_sum, 1, MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
#elif defined(STDLIB_WITH_COARRAY)
global_sum = local_sum
call co_sum( global_sum )
#endif
...
norm2 =sqrt( global_sum )If stdlib proposed wrappers for these reduction operators it could be possible to make some of the functionalities work also transparently on distributed frameworks. The idea could consist on having a module stdlib_distributed or stdlib_coarray (to promote coarray-like syntax ? ) and then:
module stdlib_<name_to_chose>
interface stdlib_co_sum
module procedure :: stdlib_co_sum_<kind>
...
end interface
contains
subroutine stdlib_co_sum_<kind>( A, result_image, stat, errmsg)
<kind>, intent(inout) :: A(..)
integer, intent(in), optional :: result_image
integer, intent(out), optional :: stat
character(*), intent(inout), optional :: errmsg
...
select rank(A)
rank(0)
#if defined(STDLIB_WITH_MPI)
call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
call co_sum(A, result_image, stat, errmsg)
#endif
rank(1)
#if defined(STDLIB_WITH_MPI)
call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
call co_sum(A, result_image, stat, errmsg)
#endif
rank(2)
#if defined(STDLIB_WITH_MPI)
call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
call co_sum(A, result_image, stat, errmsg)
#endif
...
end select
end subroutine
end moduleLike this, if one doesn't link against any of them, the kernels do nothing and return the same value. If linked, then one can rely on stdlib as an intermediate wrapper.
I haven't fully thought this through but I would like to open it for discussion.
Prior Art
No response
Additional Information
No response