Skip to content

Add UNDEFINED status and force copy methods that always transfer data #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
102 changes: 79 additions & 23 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,20 +199,70 @@ init_debug_value_jpim* to a custom value.
CALL FIELD_NEW(O, LBOUNDS=[1,1], UBOUNDS=[10,10])
```

## Asynchronism

This functionnality is still being tested.

By default all data transfers are synchronous. So every call to the subroutines
GET\_HOST\_DATA, GET\_DEVICE\_DATA, SYNC\_HOST, SYNC\_DEVICE will stop the
program until the data are actually transfered. But sometimes it is possible to
interleave the data transfer with the computations. To do so you can add the
QUEUE parameter when calling the aforementioned subroutines. With this QUEUE
parameter the user will specify on which queue he wants the data transfer to
happen, and the subroutines will return without waiting for the data transfer
to finish. It is up to the user to be sure the data transfer has been done when
he actually wants to use the data. This can be checked by using the
WAIT\_FOR\_ASYNC\_QUEUE subroutine.
## Data Transfers
There are two categories of data transfer methods in Field API. The *core API* consists of methods that internally keeps track of the status of a field and the *advanced API* which consists of methods that relies on the user to
keep track of where the data is located. There is no benefit in using the advanced api if the default API can be used and it is recommended to only use the features from the advanced API when
the same can't be achieved with the default API (e.g. for asynchronous data transfers).



### Default API
Each field defines eight type bound procedures that are part of the default API and can be used to transfer data between the device and host. Four `GET` methods:
* ``SUBROUTINE GET_DEVICE_DATA_RDONLY (SELF, PPTR)``
* ``SUBROUTINE GET_DEVICE_DATA_RDWR (SELF, PPTR)``
* ``SUBROUTINE GET_HOST_DATA_RDONLY (SELF, PPTR)``
* ``SUBROUTINE GET_HOST_DATA_RDWR (SELF, PPTR)``

and four ``SYNC`` methods:
* ``SUBROUTINE SYNC_DEVICE_DATA_RDONLY (SELF)``
* ``SUBROUTINE SYNC_DEVICE_DATA_RDWR (SELF)``
* ``SUBROUTINE SYNC_HOST_DATA_RDONLY (SELF)``
* ``SUBROUTINE SYNC_HOST_DATA_RDWR (SELF)``

Where``DEVICE/HOST`` indicates the transfer direction and ``RDONLY/WRONLY`` indicates the mode.
The difference between the ``GET`` and ``SYNC`` method is their interface. The
``GET`` methods are called with a pointer argument that will be associated with the transferred data at its destination. The ``SYNC`` method is called without any arguments and
will only perform the data transfers and update the field's inner pointers.
Depending on the status of the field the data transfer methods in the default API may not
transfer data if it is already present and fresh on the intended destination location.
All data transfers with the methods in the default API are synchronous. So every call to the
procedures listed above will stop the program until the data transfer is completed.

### Advanced API

**The advanced API cedes all responsibility for data synchronisation to the user and must therefore be used with caution.**

There are use cases when it is neccessary to have fine grained control over the data
transfers. For these use cases Field API provides an *advanced API* with
four type bound procedures that will always trigger data copies.
The advanced API consists of the four subroutines:
* ``SUBROUTINE GET_HOST_DATA_FORCE(SELF, PTR, QUEUE)``
* ``SUBROUTINE SYNC_HOST_DATA_FORCE(SELF, QUEUE)``
* ``SUBROUTINE GET_DEVICE_DATA_FORCE(SELF, PTR, QUEUE)``
* ``SUBROUTINE SYNC_HOST_DATA_FORCE(SELF, QUEUE)``

A call to any of these routines will always transfer data between the host and
the device and set the internal status of the field to ``UNDEFINED``.
Furthermore, the routines above add an optional dummy argument ``QUEUE`` that
can be used to invoke asynchronous data transfers (see more below).

If any data transfer routine from the advanced API has been used,
then the user must explicitly set the status of the field before the data
transfer methods from the default API can be used.
To this end, there are two methods that can be used to set the internal status of the
field:
* ``FORCE_DEVICE_FRESH`` - should be called if the device data is up to date.
* ``FORCE_HOST_FRESH`` - should be called if the host data is up date.


#### Asynchronous data transfers
Using the advanced API it is possible to asynchronously transfer data, i.e. issue a non-blocking instruction.
To do so you can add the optional QUEUE parameter when calling the aforementioned subroutines.
With this QUEUE parameter the user will specify on which queue he wants the data transfer to
happen and the subroutines will return without waiting for the data transfer
to finish. It is up to the user to be sure that the data transfer has been completed
before the data is accessed. This can be checked by using the
``WAIT_FOR_ASYNC_QUEUE`` subroutine.

```
SUBROUTINE SUB(MYTEST)
Expand All @@ -227,7 +277,7 @@ CALL FIELD_NEW(FO2, /1,0/, /10,10/)

!Do stuff with FO on GPUs
!Then transfer data to CPU
CALL FO%SYNC_HOST_RDONLY(QUEUE=2)
CALL FO%SYNC_HOST_FORCE(QUEUE=2)

!Do stuff with FO2 on GPUs
!We didn't have to wait for the data transfer of FO to finish
Expand Down Expand Up @@ -271,17 +321,23 @@ SUBROUTINE FIELD_RESIZE(SELF, ...)
SUBROUTINE FIELD_DELETE(SELF)
SUBROUTINE DELETE_DEVICE
FUNCTION GET_VIEW(SELF, BLOCK_INDEX, ZERO) RESULT(VIEW_PTR)
SUBROUTINE GET_DEVICE_DATA_RDONLY (SELF, PPTR, QUEUE)
SUBROUTINE GET_DEVICE_DATA_RDWR (SELF, PPTR, QUEUE)
SUBROUTINE GET_HOST_DATA_RDONLY (SELF, PPTR, QUEUE)
SUBROUTINE GET_HOST_DATA_RDWR (SELF, PPTR, QUEUE)
SUBROUTINE SYNC_HOST_RDWR (SELF, QUEUE)
SUBROUTINE SYNC_HOST_RDONLY (SELF, QUEUE)
SUBROUTINE SYNC_DEVICE_RDWR (SELF, QUEUE)
SUBROUTINE SYNC_DEVICE_RDONLY (SELF, QUEUE)
SUBROUTINE GET_DEVICE_DATA_RDONLY (SELF, PPTR)
SUBROUTINE GET_DEVICE_DATA_RDWR (SELF, PPTR)
SUBROUTINE GET_DEVICE_DATA_FORCE (SELF, PPTR, QUEUE) ! Disables Field API internal status tracking
SUBROUTINE GET_HOST_DATA_RDONLY (SELF, PPTR)
SUBROUTINE GET_HOST_DATA_RDWR (SELF, PPTR)
SUBROUTINE GET_HOST_DATA_FORCE (SELF, PPTR, QUEUE) ! Disables Field API internal status tracking
SUBROUTINE SYNC_HOST_RDWR (SELF)
SUBROUTINE SYNC_HOST_RDONLY (SELF)
SUBROUTINE SYNC_HOST_FORCE (SELF, QUEUE)
SUBROUTINE SYNC_DEVICE_RDWR (SELF)
SUBROUTINE SYNC_DEVICE_RDONLY (SELF)
SUBROUTINE SYNC_DEVICE_FORCE (SELF, QUEUE)
SUBROUTINE COPY_OBJECT (SELF, LDCREATED)
SUBROUTINE WIPE_OBJECT (SELF, LDDELETED)
SUBROUTINE GET_DIMS (SELF, LBOUNDS, UBOUNDS)
SUBROUTINE FORCE_DEVICE_FRESH(SELF)
SUBROUTINE FORCE_HOST_FRESH(SELF)
```

Utils:
Expand Down
15 changes: 8 additions & 7 deletions src/buffer/field_RANKSUFF_gang_module.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -98,17 +98,18 @@ CONTAINS
INTEGER (KIND=JPIM), INTENT(IN) :: MODE
${ft1.type}$, POINTER, INTENT(INOUT) :: PTR(${ft1.shape}$)
INTEGER (KIND=JPIM), OPTIONAL, INTENT(IN) :: QUEUE
INTEGER(KIND=JPIM) :: LBOUNDS(${ft1.rank}$)

IF (ASSOCIATED (SELF%PARENT)) THEN
IF (IAND (MODE, NWR) /= 0) THEN
CALL SELF%PARENT%SYNC_${what}$_RDWR (QUEUE)
ELSEIF (IAND (MODE, NRD) /= 0) THEN
CALL SELF%PARENT%SYNC_${what}$_RDONLY (QUEUE)
IF (SELF%GET_STATUS() /= UNDEFINED) THEN
IF (ASSOCIATED (SELF%PARENT)) THEN
IF (IAND (MODE, NWR) /= 0) THEN
CALL SELF%PARENT%SYNC_${what}$_RDWR ()
ELSEIF (IAND (MODE, NRD) /= 0) THEN
CALL SELF%PARENT%SYNC_${what}$_RDONLY ()
ENDIF
ENDIF
ENDIF

CALL SELF%${ftn1}$_WRAPPER%GET_${what}$_DATA (MODE, PTR, QUEUE)
CALL SELF%${ftn1}$_WRAPPER%GET_${what}$_POINTER(PTR)

END SUBROUTINE

Expand Down
Loading
Loading