Fixing a (likely) bug in `MapFusion`. #1673

pratyai · 2024-10-04T13:22:09Z

When checking for the "array usage" criteria that can prevent map-fusion, check only within the current state. Otherwise, any "use" of the array globally (i.e., in the entire SDFG) will cancel the fusion.

tbennun · 2024-10-05T01:32:58Z

This should not be the correct behavior. Eliminating an access node that corresponds to a data container has to be done safely: if a data container is used in another state (via an access node with the same name), its memory must be preserved because it can be reused in a subsequent read (which is a case that happens often when conserving memory). If you want map fusion to work, you would need to use a unique array name that can be removed, or rename arrays that you know they will be overwritten later.

Checking the SDFG for other instances of the access node is the correct way to check for a safe removal of an access node:

>>> sdfg = dace.SDFG('test')
>>> state = sdfg.add_state()
>>> state2 = sdfg.add_state_after(state)
>>> state.add_access('A')
>>> state2.add_access('A')
# Checking access nodes:
>>> state.data_nodes()
[AccessNode (A)]
>>> sdfg.data_nodes()
[AccessNode (A), AccessNode (A)]

pratyai · 2024-10-05T19:50:31Z

I see that my test case (of applying fusion twice) has this problem: the corresponding transient access nodes in two copies have the same name. I was assuming that the transient access nodes are scoped only within the state, and states cannot rely on other states' transient access nodes.
Just to confirm, must the transient access nodes have globally unique names?

Even then, I think that MapFusion's current way sometimes unnecessarily chooses to avoid the fusion. Could we just not remove such access nodes and just proceed with the fusion where both versions would produce identical output? In other words, what if we allow the fusion without any access-node-removal, and then remove any redundant transient nodes?

[ By the way, I've been looking at this since only very recently, so chances are that I have some misunderstanding about access nodes. Apologies in advance, if that's indeed the case here. ]

pratyai · 2024-10-07T10:22:10Z

I've come up with the following example to demonstrate. I want the MapFusion to work on the following graph:

def f4():  
    g = dace.SDFG('f4')  
    g.add_array('A', [10], dace.float32)  
    g.add_array('B', [10], dace.float32)  
  
    st0 = g.add_state('st0')  
    # First map  
    wA_e, wA_x = st0.add_map('st0_wA', {'i': f"0:10"})  
    op0 = st0.add_tasklet('set_1', {}, {'Ai_out'}, 'Ai_out = 1')  
    st0.add_memlet_path(wA_e, op0, memlet=dace.Memlet())  
    wA = st0.add_write('A')  
    st0.add_memlet_path(op0, wA_x, wA, src_conn='Ai_out', memlet=dace.Memlet(data='A', subset='i'))  
    # Second map  
    wB_e, wB_x = st0.add_map('st0_wB', {'i': f"0:10"})  
    op1 = st0.add_tasklet('set_B', {'Ai_in'}, {'Bi_out'}, 'Bi_out = Ai_in')  
    st0.add_memlet_path(wA, wB_e, op1, dst_conn='Ai_in', memlet=dace.Memlet(data='A', subset='i'))  
    wB = st0.add_write('B')  
    st0.add_memlet_path(op1, wB_x, wB, src_conn='Bi_out', memlet=dace.Memlet(data='B', subset='i'))  
  
    st1 = g.add_state_after(st0, 'st1')  
    # First map  
    wA_e, wA_x = st1.add_map('st1_wA', {'i': f"0:10"})  
    op2 = st1.add_tasklet('set_2', {}, {'Ai_out'}, 'Ai_out = 2')  
    st1.add_memlet_path(wA_e, op2, memlet=dace.Memlet())  
    wA = st1.add_write('A')  
    st1.add_memlet_path(op2, wA_x, wA, src_conn='Ai_out', memlet=dace.Memlet(data='A', subset='i'))  
    g.validate()  
    return g

Which produces this SDFG:

This SDFG currently does not allow fusion inside state#1, since state#2 also contains an access to A. However, I would like to transform it to:

I claim that:

Fusion should not remove the access node A from state#1.
State#2 can only rely on the data in A after state#1 has completed in entirety. So, the fusion should still produce the same data in A and B at the end of state#1, but should be allowed to transform computation within it.

tbennun · 2024-10-10T14:45:16Z

@philip-paul-mueller related to what you are working on?

philip-paul-mueller · 2024-10-11T05:05:22Z

Yes. This problem should be solved by the new map fusion.

philip-paul-mueller · 2024-10-16T05:40:28Z

The error you have looks familiar, I saw it when I rewrote MapFusion.
Back then I had the feeling that subgraph fusion, the component that fails, highly depends on how the graph produced by MapFusion looks.
My solution was to tweak MapFusion in such a way to avoid some fusion operation during auto optimization.
This is not a nice solution, but it worked.

@pratyai
I would also advice you to look at my PR that reimplements MapFusion.
Since I have currently other, more urgent, stuff to do you are more than welcomed to work on it.
If you are interested contact me in Mattermost.

pratyai · 2024-10-16T08:21:44Z

@philip-paul-mueller while I generally like your PR at a high level (including the choice of two versions of MapFusion), I don't think right now I have sufficient expertise of DaCe to convincingly push this change myself -- especially since some architecture-level concerns (e.g. two map fusions) are not entirely resolved, as far as I can tell. Perhaps in a few weeks I would be able to do so (hopefully).

like: ```c++ cpp.reshape_strides(Range([(0, 4, 1), (0, 5, 1)]), None, None, [2, 3, 5]) ``` and crashes with an index error.

outside the loop. It seems to be a typo on `iedge`.

`nd` fields.

need to be tied to the class itself. Remove their unnecessary arguments.

+ Removing unnecessary imports.

map-fusion, check only within the current state. Otherwise, any "use" of the array _globally_ (i.e., in the entire SDFG) will cancel the fusion.

if they are used elsewhere.

pratyai mentioned this pull request Oct 16, 2024

SubgraphFusion crashes with IndexError: list index out of range error on certain graphs. #1687

Open

pratyai force-pushed the fix-map-fusion-two-state branch 3 times, most recently from c98af78 to 19525ef Compare October 16, 2024 13:54

pratyai added 18 commits October 17, 2024 17:21

Fix the issue with cpp codegen, where it currently cannot handle inputs

57edcba

like: ```c++ cpp.reshape_strides(Range([(0, 4, 1), (0, 5, 1)]), None, None, [2, 3, 5]) ``` and crashes with an index error.

Use integer division, fractions don't make sense here anyway.

aff5a9e

Add even more tests to cover strides and offsets.

a40ba15

Start with cleaning up the imports.

a942c5e

Since edge is a loop variable, we probably don't want to use that

c0c2519

outside the loop. It seems to be a typo on `iedge`.

Fix a bunch of docstrings and typehints.

0ee87f0

Remove a couple of redundant == Trues.

123952c

Since port_created is only a pair (or None), it does not have st and

f7bfa4a

`nd` fields.

Fix another typehint.

49719a7

Fixes for various minor PEP 8 complaints (which are otherwise no-op).

851f80a

Typehints and set literal in helpers.

a99bce4

Move various helper functions out of @staticmethod, since they don't

45d626e

need to be tied to the class itself. Remove their unnecessary arguments.

Breakout two pieces of fuse() function into their own helpers.

288e7c0

Finally, fix the bug in subgraph fusion that was causing index error.

2b5b851

Forgot to update the references to subgraph fusion, fixing now

c4b6ef6

+ Removing unnecessary imports.

Validating halfway in the fusion wasn't a good idea.

bbe8132

When checking for the "array usage" criteria that can prevent

c0906da

map-fusion, check only within the current state. Otherwise, any "use" of the array _globally_ (i.e., in the entire SDFG) will cancel the fusion.

Make sure that the intermediate transient access nodes are not removed

d13cfba

if they are used elsewhere.

pratyai force-pushed the fix-map-fusion-two-state branch from 19525ef to d13cfba Compare October 17, 2024 17:45

pratyai added the no-ci Do not run any CI or actions for this PR label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing a (likely) bug in `MapFusion`. #1673

Fixing a (likely) bug in `MapFusion`. #1673

pratyai commented Oct 4, 2024

tbennun commented Oct 5, 2024

pratyai commented Oct 5, 2024

pratyai commented Oct 7, 2024

tbennun commented Oct 10, 2024

philip-paul-mueller commented Oct 11, 2024

philip-paul-mueller commented Oct 16, 2024

pratyai commented Oct 16, 2024 •

edited

Loading

Fixing a (likely) bug in MapFusion. #1673

Are you sure you want to change the base?

Fixing a (likely) bug in MapFusion. #1673

Conversation

pratyai commented Oct 4, 2024

tbennun commented Oct 5, 2024

pratyai commented Oct 5, 2024

pratyai commented Oct 7, 2024

tbennun commented Oct 10, 2024

philip-paul-mueller commented Oct 11, 2024

philip-paul-mueller commented Oct 16, 2024

pratyai commented Oct 16, 2024 • edited Loading

Fixing a (likely) bug in `MapFusion`. #1673

Fixing a (likely) bug in `MapFusion`. #1673

pratyai commented Oct 16, 2024 •

edited

Loading