Skip to content

Conversation

@philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jan 28, 2026

Before the optimizer was assuming that the memory allocation for GPU and CPU was different, i.e. that in CPU the stride 1 dimension is associated with the vertical dimension while for GPU it is associated with the horizontal dimension. However, this is wrong and in both cases stride 1 is associated with the horizontal dimension.
This PR fixes this and now the loop order and the memory layout for transients assumes that stride 1 is associated to the horizontal dimension.

Note that the current implementation assumes that there is only one horizontal dimension.

TODO:

@philip-paul-mueller philip-paul-mueller marked this pull request as ready for review January 28, 2026 08:02
unit_strides_kind = (
gtx_common.DimensionKind.HORIZONTAL if gpu else gtx_common.DimensionKind.VERTICAL
)
unit_strides_kind = gtx_common.DimensionKind.HORIZONTAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does that make sense? You cannot assume anything...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is that just for transients? Then I would change the comment assume -> set or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things here, first the name is bad and should be probably something else.
However the value selection is correct, one could even argue that it is probably the only one that make sense.
The reason for this is that the maximal numbers of blocks is different for each direction, because (for ICON) size(horizontal) >>> size(vertical) one would get launch errors otherwise.

@edopao edopao changed the title fix[dace-next]: Fix Memory Layout for CPU fix[next-dace]: Fix Memory Layout for CPU Jan 28, 2026
…escription.

If the leading kind is not known then it will not reorder strides nor the iteration order.
However, for cetain reasons (launch errors) we have to set one for GPU in that case.
Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one refactoring suggestion.

@philip-paul-mueller
Copy link
Contributor Author

There is some variability but I think there is nothing pathological going on.

bench_blueline_stencil_compute

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants