Skip to content

Enable distributed parallelism #1048

@jalvesz

Description

@jalvesz

Motivation

As of today, working on distributed parallelism in Fortran mostly implies using MPI or coarray. But as of now, one has to decide early on which one to relay on. I would like to propose for stdlib to wrap certain basic reduction operators which can rely on either of them through C-preprocessing such that other procedures from stdlib could profit from such a wrapper.

I'll try to give a picture with a very simple example, let's say computing the norm2 of a 1D array. This operations requires a parallel sum reduction before computing the square root:

<kind> :: x(:) !> in spmd, each process/image has a partial portion of the array
<kind> :: local_sum , global_sum
...
local_sum = dot_product( x, x ) !> this sum is incomplete with respect to the distributed data
#if defined(STDLIB_WITH_MPI)
call MPI_Allreduce(local_sum, global_sum, 1, MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
#elif defined(STDLIB_WITH_COARRAY)
global_sum = local_sum 
call co_sum( global_sum  )
#endif
...
norm2 =sqrt( global_sum )

If stdlib proposed wrappers for these reduction operators it could be possible to make some of the functionalities work also transparently on distributed frameworks. The idea could consist on having a module stdlib_distributed or stdlib_coarray (to promote coarray-like syntax ? ) and then:

module stdlib_<name_to_chose>

interface stdlib_co_sum
    module procedure ::  stdlib_co_sum_<kind>
    ...
end interface

contains

subroutine stdlib_co_sum_<kind>( A, result_image, stat, errmsg)
    <kind>, intent(inout) :: A(..)
    integer, intent(in), optional :: result_image
    integer, intent(out), optional :: stat
    character(*), intent(inout), optional :: errmsg
    ...
    select rank(A)
    rank(0)
#if defined(STDLIB_WITH_MPI)
        call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
        A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
        call co_sum(A, result_image, stat, errmsg)
#endif
    rank(1)
#if defined(STDLIB_WITH_MPI)
        call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
        A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
        call co_sum(A, result_image, stat, errmsg)
#endif
    rank(2)
#if defined(STDLIB_WITH_MPI)
        call MPI_Allreduce(A, global_sum, size( A ), MPI_<kind>, MPI_SUM, MPI_COMM_WORLD, ierr)
        A = global_sum
#elif defined(STDLIB_WITH_COARRAY)
        call co_sum(A, result_image, stat, errmsg)
#endif
...
    end select
end subroutine

end module

Like this, if one doesn't link against any of them, the kernels do nothing and return the same value. If linked, then one can rely on stdlib as an intermediate wrapper.

I haven't fully thought this through but I would like to open it for discussion.

Prior Art

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    difficultThis issue might be more difficult to resolveenhancementNew feature or requestideaProposition of an idea and opening an issue to discuss ittopic: interfaceInterfacing with other libraries or languages

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions