-
Notifications
You must be signed in to change notification settings - Fork 54
fix[next-dace]: Fix Memory Layout for CPU #2459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix[next-dace]: Fix Memory Layout for CPU #2459
Conversation
| unit_strides_kind = ( | ||
| gtx_common.DimensionKind.HORIZONTAL if gpu else gtx_common.DimensionKind.VERTICAL | ||
| ) | ||
| unit_strides_kind = gtx_common.DimensionKind.HORIZONTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does that make sense? You cannot assume anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is that just for transients? Then I would change the comment assume -> set or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things here, first the name is bad and should be probably something else.
However the value selection is correct, one could even argue that it is probably the only one that make sense.
The reason for this is that the maximal numbers of blocks is different for each direction, because (for ICON) size(horizontal) >>> size(vertical) one would get launch errors otherwise.
…escription. If the leading kind is not known then it will not reorder strides nor the iteration order. However, for cetain reasons (launch errors) we have to set one for GPU in that case.
edopao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only one refactoring suggestion.
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Show resolved
Hide resolved
edopao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM

Before the optimizer was assuming that the memory allocation for GPU and CPU was different, i.e. that in CPU the stride 1 dimension is associated with the vertical dimension while for GPU it is associated with the horizontal dimension. However, this is wrong and in both cases stride 1 is associated with the horizontal dimension.
This PR fixes this and now the loop order and the memory layout for transients assumes that stride 1 is associated to the horizontal dimension.
Note that the current implementation assumes that there is only one horizontal dimension.
TODO: