refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457

philip-paul-mueller · 2026-01-23T10:39:49Z

This PR changes how Intra-Map dataflow optimization works. Specifically it adds TaskletFusion which, as its name suggests is able to merge Tasklets together.

TODO:

Run ICON4Py CI: DO NOT MERGE: Test TaskletFusion in Auto Optimizer C2SM/icon4py#1016
Add an option to disable/enable it.
Import the option in ICON4Py (is implemented in the muphys staging branch).
Run Bluline (again)

NOT WORKING: 5.7612245082855225

DOES NOT WORK: 5.77982020

NOT WORKING: 5.89192s

SEEMS WORKING: 4.57165s

…tion. But it also has an additional simplify that was present when TF was run in stage 1, but not in the other version. PERFORMANCE: 4.5106589s

…n_map_dataflow_optimization_order_public

philip-paul-mueller · 2026-01-26T10:40:42Z

It seems that this change introduces a 5% performance penalty in compute_advection_in_horizontal_momentum_equation().
To ensure that it is not indeterministic behaviour I lowered it twice, but it persisted.
Due to the importance of that kernel, we have to handle it in some way, maybe add an option compact_tasklets or so.

philip-paul-mueller · 2026-01-26T12:26:31Z

It seems that there is a 5% performance penalty in compute_horizontal_mementum_equation. To make sure that it is not something indeterministic I lowered it twice.

philip-paul-mueller · 2026-01-26T12:26:45Z

cscs-ci run

philip-paul-mueller · 2026-01-29T10:30:24Z

cscs-ci run

iomaganaris · 2026-01-29T11:58:07Z

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py

+    #   things simpler or prevent it from doing certain, negative, things).
+    # TODO(phimuell): Restrict it to Tasklets only inside Maps.
+    # TODO(phimuell): Investigate more.
+    sdfg.apply_transformations_repeated(


Can we add an option to disable this to avoid getting the regression in the other stencil as an interim solution until we understand the implications of this transformation?

Yes we should do it the problem is just that we have to coordinate this, because PR#2450 broke compatibility (the metric package in GT4Py was moved, thus we can not simply make a new release.

FYI: The staging branch (C2SM/icon4py#1009) has the correct import of the metric package.

By default it is off.

edopao · 2026-01-29T13:03:21Z

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py

    blocking_only_if_independent_nodes: Optional[bool],
    scan_loop_unrolling: bool,
    scan_loop_unrolling_factor: int,
+    compact_tasklets: bool,


Suggested change

compact_tasklets: bool,

fuse_tasklets: bool,

edopao · 2026-01-29T13:03:36Z

src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py

    assume_pointwise: bool = True,
    optimization_hooks: Optional[dict[GT4PyAutoOptHook, GT4PyAutoOptHookFun]] = None,
    demote_fields: Optional[list[str]] = None,
+    compact_tasklets: bool = False,


Suggested change

compact_tasklets: bool = False,

fuse_tasklets: bool = False,

edopao

LGTM

philip-paul-mueller added 6 commits January 22, 2026 08:40

Let's try this order.

9cade71

NOT WORKING: 5.7612245082855225

Maybe this is better.

cba0996

DOES NOT WORK: 5.77982020

Maybe the simplify call was unneeded.

8a389ee

NOT WORKING: 5.89192s

This is nearer at the empirical version, let's try it.

244dc10

SEEMS WORKING: 4.57165s

This is a bit nicer than the previous version, i.e. it has an explana…

ac2c5ce

…tion. But it also has an additional simplify that was present when TF was run in stage 1, but not in the other version. PERFORMANCE: 4.5106589s

Merge commit 'ac2c5ce1175e4706a01e8a3aaa86d882c1dbdfdd' into better_i…

d1fac9c

…n_map_dataflow_optimization_order_public

philip-paul-mueller mentioned this pull request Jan 23, 2026

DO NOT MERGE: Experiment for TaskletFusion in Intra-Map Optimization #2454

Draft

This was referenced Jan 23, 2026

feat[next-dace]: Enable tasklet fusion in dataflow optimization #2452

Closed

Muphys: optimization hook for graupel program C2SM/icon4py#1001

Closed

Muphys: staging branch C2SM/icon4py#1009

Draft

philip-paul-mueller changed the title ~~DO NOT MERGE: refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization~~ refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization Jan 26, 2026

Updated the description.

d707717

philip-paul-mueller requested review from edopao, havogt and iomaganaris January 26, 2026 06:57

philip-paul-mueller marked this pull request as ready for review January 26, 2026 10:17

philip-paul-mueller mentioned this pull request Jan 29, 2026

DO NOT MERGE: Test TaskletFusion in Auto Optimizer C2SM/icon4py#1016

Draft

1 task

edopao changed the title ~~refactor[dace-next]: New Optimization Scheme in Intra-Map Optimization~~ refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization Jan 29, 2026

edopao mentioned this pull request Jan 29, 2026

feat[next]: Muphys staging #2462

Draft

iomaganaris reviewed Jan 29, 2026

View reviewed changes

Added an option to disable TaskletFusion.

9d092c4

By default it is off.

philip-paul-mueller mentioned this pull request Jan 29, 2026

Update To New Version Of Intre Map Optimization C2SM/icon4py#1019

Merged

edopao reviewed Jan 29, 2026

View reviewed changes

Made the suggested renaming.

7a8f22b

edopao approved these changes Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457

refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457

philip-paul-mueller commented Jan 23, 2026 •

edited

Loading

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 29, 2026

Uh oh!

iomaganaris Jan 29, 2026

Uh oh!

philip-paul-mueller Jan 29, 2026

Uh oh!

edopao Jan 29, 2026

Uh oh!

edopao Jan 29, 2026

Uh oh!

edopao Jan 29, 2026

Uh oh!

edopao left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457

Are you sure you want to change the base?

refactor[next-dace]: New Optimization Scheme in Intra-Map Optimization #2457

Conversation

philip-paul-mueller commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 26, 2026

Uh oh!

philip-paul-mueller commented Jan 29, 2026

Uh oh!

iomaganaris Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

edopao Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

edopao Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

edopao Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

edopao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

philip-paul-mueller commented Jan 23, 2026 •

edited

Loading