Skip to content

Conversation

@samuelsonric
Copy link
Collaborator

@samuelsonric samuelsonric commented Oct 2, 2025

Hi @GiggleLiu I hope you are well.

I just released an update to CliqueTrees.jl. You should notice that the MF and ND elimination algorithms are much, much faster, now. In fact, they are so fast that a great deal of the running time comes from setting up the problem in treewidth.jl file. The PR rewrites the file, speeding things up. It also fixes this printing bug.

I made a couple decisions which I would like your advice on. At this point in the optimize_treewidth function, the contraction schedule code is n-ary. As you can see, we "binarize" the tree by calling code = _optimize_code(code, size_dict, GreedyMethod()). This behavior is triggered by a keyword argument binary, which defaults to true.

Do contraction trees have to be binary? If so, is this the "right" way to binarize one? In practice, it seems to preserve the time and space complexity, but I suspect this is not a guarantee. Also, if n-ary trees are acceptable, then we might want to expose the binary keyword from the function optimize_treewidth.

Sincerely,
Richard Samuelson

@samuelsonric
Copy link
Collaborator Author

The failing test is just a hard-coded contraction tree!

@GiggleLiu
Copy link
Member

GiggleLiu commented Oct 3, 2025

Hey, glad to see new progress!

Do contraction trees have to be binary? If so, is this the "right" way to binarize one? In practice, it seems to preserve the time and space complexity, but I suspect this is not a guarantee. Also, if n-ary trees are acceptable, then we might want to expose the binary keyword from the function optimize_treewidth.

We favor the binary format for multiple reasons,

  1. only binary contraction supports BLAS speedup. We always convert tensor contraction to (batched) matrix multiplication first. For non-binary contraction, the current contraction backend is very slow. We will switch to cuTensor backend in the future, which has additional support to trinary operations: https://docs.nvidia.com/cuda/cutensor/latest/api/cutensor.html#contraction-operations . I would recommend not relying too much on this feature.
  2. only binary tree can be used in the current version of slicing, a technique to reduce the memory cost of a contraction tree. A non binary tree will trigger an error when calling slice_code on it. This behavior is relatively easy to change, we can convert the code to be binary when feeding it into slice_code.

but I suspect this is not a guarantee

I think it is guaranted. A contraction complexity is determined by the number (and size) of variables involved in the current step. Binary contraction involves strictly less or equal number of variables when compared with more than two tensors being involved. But I admit this argument is only from the complexity perspective. What about the constant factor? apparently not optimal. e.g. the i,i,i-> type of broadcasting operation, the triary operation is cheaper. These pattern does not appear often in practise, so I usually ignore it.

The best API to produce a binary tree is:

# convert a tree with 1-2 children to a binary tree

You can see it is very cheap method to call. However, it requires the input tree to only have unary and binary nodes. For nary node, we still need a cheap method to determine a contraction order. From this perspective, I think GreedyMethod on NestedEinsum is the best choice at the moment. Do you see any performance issue when calling this method? I would be supprised if it is slow.

Copy link
Member

@GiggleLiu GiggleLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I agree on the key changes in this PR. Before merging, I hope the tests can be fixed. A comment on styling: I personally favor shorter functions, they are easier to test and review.

@samuelsonric
Copy link
Collaborator Author

samuelsonric commented Oct 3, 2025

Some benchmarks to justify the change. The tensor network mc_2023_151.json has 18764 tensors and 6221 indices.

6221×18764 SparseArrays.SparseMatrixCSC{Int64, Int64} with 45064 stored entries:
⎡⠙⠒⠶⢶⣶⣶⣶⣶⣴⣤⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⠐⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠀⠀⠀⠉⠙⠛⠿⢿⣿⣿⣿⣷⣶⣤⣤⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠙⠛⠻⠯⣭⣉⡉⠉⠉⠉⠉⠉⠉⠉⠉⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠛⠲⠦⣤⣀⡀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠙⠓⠧⢦⣄⣀⣀⣀⣀⣀⣀⣀⡀⣀⣀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠙⠓⠲⠦⣤⣀⣀⢸⡇⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠙⠓⎦

before

julia> optimizer = HyperND(; dis=METISND(), width=50, imbalances=100:10:800);

julia> @time optimize_code(code, size_dict, optimizer);
  46.792407 seconds (583.91 M allocations: 34.268 GiB, 13.56% gc time)

after

julia> optimizer = HyperND(; dis=METISND(), width=50, imbalances=100:10:800);

julia> @time optimize_hyper_nd(optimizer, code, size_dict; binary=false);
  13.996466 seconds (26.68 M allocations: 9.982 GiB, 15.67% gc time)

julia> @time optimize_code(code, size_dict, optimizer);
  14.616734 seconds (33.42 M allocations: 11.690 GiB, 14.03% gc time, 0.00% compilation time)

For sufficiently large tensor networks, binarization triggers a StackOverflow error.

@GiggleLiu GiggleLiu merged commit 47278e9 into TensorBFS:master Oct 4, 2025
2 checks passed
@GiggleLiu
Copy link
Member

LGTM. Since no exported APIs change, I tagged a patch version here: JuliaRegistries/General#139722

@samuelsonric
Copy link
Collaborator Author

@GiggleLiu Hi, I forgot to mention -- I changed the default settings of HyperND from imbalances=130:1:130 to imbalances=100:10:800, since the latter setting is more appropriate to your use case.

@GiggleLiu
Copy link
Member

@GiggleLiu Hi, I forgot to mention -- I changed the default settings of HyperND from imbalances=130:1:130 to imbalances=100:10:800, since the latter setting is more appropriate to your use case.

No problem, thank you for the reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Printing of HyperND() fails

2 participants