Skip to content

Conversation

@philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jan 23, 2026

This PR changes how Intra-Map dataflow optimization works. Specifically it adds TaskletFusion which, as its name suggests is able to merge Tasklets together.

TODO:

NOT WORKING: 5.7612245082855225
DOES NOT WORK: 5.77982020
…tion.

But it also has an additional simplify that was present when TF was run in stage 1, but not in the other version.

PERFORMANCE: 4.5106589s
@philip-paul-mueller philip-paul-mueller changed the title DO NOT MERGE: refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization Jan 26, 2026
@philip-paul-mueller philip-paul-mueller marked this pull request as ready for review January 26, 2026 10:17
@philip-paul-mueller
Copy link
Contributor Author

It seems that this change introduces a 5% performance penalty in compute_advection_in_horizontal_momentum_equation().
To ensure that it is not indeterministic behaviour I lowered it twice, but it persisted.
Due to the importance of that kernel, we have to handle it in some way, maybe add an option compact_tasklets or so.

bench_blueline_stencil_compute

@philip-paul-mueller
Copy link
Contributor Author

It seems that there is a 5% performance penalty in compute_horizontal_mementum_equation. To make sure that it is not something indeterministic I lowered it twice.

bench_blueline_stencil_compute

@philip-paul-mueller
Copy link
Contributor Author

cscs-ci run

@edopao edopao changed the title refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization Jan 29, 2026
@philip-paul-mueller
Copy link
Contributor Author

cscs-ci run

# things simpler or prevent it from doing certain, negative, things).
# TODO(phimuell): Restrict it to Tasklets only inside Maps.
# TODO(phimuell): Investigate more.
sdfg.apply_transformations_repeated(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an option to disable this to avoid getting the regression in the other stencil as an interim solution until we understand the implications of this transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should do it the problem is just that we have to coordinate this, because PR#2450 broke compatibility (the metric package in GT4Py was moved, thus we can not simply make a new release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The staging branch (C2SM/icon4py#1009) has the correct import of the metric package.

blocking_only_if_independent_nodes: Optional[bool],
scan_loop_unrolling: bool,
scan_loop_unrolling_factor: int,
compact_tasklets: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compact_tasklets: bool,
fuse_tasklets: bool,

assume_pointwise: bool = True,
optimization_hooks: Optional[dict[GT4PyAutoOptHook, GT4PyAutoOptHookFun]] = None,
demote_fields: Optional[list[str]] = None,
compact_tasklets: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compact_tasklets: bool = False,
fuse_tasklets: bool = False,

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants