Reduced time & memory footprint for Tarjans algorithm, fixed a bug where it was O(E^2) on star graphs. #1559

saolof · 2021-04-11T23:48:04Z

I worked on improving the performance of the strongly_connected_components function, and on fixing #1560 .

This PR eliminates several of the arrays that were used for auxiliary data using various tricks that were directly inspired by David J. Pearce's preprint over at https://homepages.ecs.vuw.ac.nz/~djp/files/IPL15-preprint.pdf , which reduces memory usage & allocations and improves cache efficiency. In particular, this lead to a significant speedup for dense random graphs in benchmarking. I would also subjectively claim that this version of the algorithm is more easily readable than Tarjan's original version.

In addition to this, it adds a new stack to keep track of the iteration state for nodes with a large number of outneighbours, specifically to fix #1560 . Interestingly, benchmarking revealed that the size threshold above which saving the iteration state is worth doing is surprisingly large (on the order of a thousand outneighbours), though of course that still applies to many real-world graphs.

I also put up a gist that anyone can use for some quick benchmarks:
https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da

Furthermore, since it also computes a component table at the same time, adding a flag to return the component table would allow saving some work in other functions that compute one for the SCCs such as https://github.com/JuliaGraphs/LightGraphs.jl/blob/2a644c2b15b444e7f32f73021ec276aa9fc8ba30/src/connectivity.jl#L541 and places where that is called such as when computing transitive closures.

I worked on improving the strongly_connected_components function in this graph. This PR eliminates several of the arrays that were used for auxiliary data using various tricks inspired by https://homepages.ecs.vuw.ac.nz/~djp/files/IPL15-preprint.pdf , which reduces memory usage & allocations and improves cache efficiency. This seems to lead to a speedup in every category of random graphs in benchmarking. I would also subjectively claim that this version of the algorithm is more easily readable than Tarjan's original version. Also put up a gist of a few alternatives I tried out: https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da

spelled function name correctly

saolof

Ah. Forgot to change the name of the method back to strongly_connected_components after working on it on my own machine with lightgraphs loaded.

codecov · 2021-04-12T00:49:34Z

Codecov Report

Merging #1559 (09683df) into master (5d2b80a) will decrease coverage by 0.12%.
The diff coverage is 88.33%.

❗ Current head 09683df differs from pull request most recent head ebb392b. Consider uploading reports for the commit ebb392b to get more accurate results

@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
- Coverage   99.44%   99.31%   -0.13%     
==========================================
  Files         106      106              
  Lines        5551     5568      +17     
==========================================
+ Hits         5520     5530      +10     
- Misses         31       38       +7

Counting downwards instead of upwards has the advantage that rindex becomes a lookup table for components, if we ever decide to return both. Also makes the algorithm invariant crystal clear.

saolof

Changed it so that the visitation count is downwards and component_count is upwards. This has the advantage that v is in components[rindex[v]] (so rindex is a lookup table for components) if we decide to add a flag to return rindex. Also makes the algorithm invariant clearer.

simonschoelly · 2021-04-12T12:50:02Z

Hi, thanks for your contribution - It's been a few years since I last looked at Tarjans algorithm, so I will first have to figure out again how it works to do some complete code review. In the meantime, could you add the code you used for benchmarking and the results that you got as a comment to this PR?

simonschoelly · 2021-04-12T12:55:31Z

src/connectivity.jl

+is_unvisited(data::AbstractVector,v::Integer) = iszero(data[v])
+is_unvisited(data::Dict,v) = !haskey(data,v)
+
+@traitfn function strongly_connected_components(g::AG::IsDirected) where {T, AG <: AbstractGraph{T}}


It doesn't hurt that you removed the condition that T is a subtype of Integer but currently the AbstractGraph interface does not allow any other eltype than some Integer (and they are consecutive from 1 to nv(g)), so you probably do not need to use a Dict.

I see. The dict is only ever used as a fallback if T is not an Integer (for example, if someone abused the interface by inheriting from AbstractGraph with T = Symbol), otherwise the code still uses vectors by default and does assume that the nodes are consecutive. The extra lines with the fallbacks are optional.

I that case, I would add the {T <: Integer} back - T don't think we should handle the case that the interface is incorrectly implemented as it is not clear, where something could go wrong.

src/connectivity.jl

simonschoelly · 2021-04-12T13:03:08Z

src/connectivity.jl

+    component_count = 1  # Index of the current component being discovered.
+    # Invariant 1: count is always smaller than component_count.
+    # Invariant 2: if rindex[v] < component_count, then v is in components[rindex[v]].
+    # This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.


Suggested change

# This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.

# This trivially lets us tell if a vertex belongs to a previously discovered scc without any extra bits, just inequalities.

For consistency we should always use the term vertex, although they are synonyms in the literatore

simonschoelly · 2021-04-12T13:04:07Z

src/connectivity.jl

+is_unvisited(data::AbstractVector,v::Integer) = iszero(data[v])
+is_unvisited(data::Dict,v) = !haskey(data,v)
+
+@traitfn function strongly_connected_components(g::AG::IsDirected) where {T, AG <: AbstractGraph{T}}


You mention some paper where you code your algorithm from. Lets add a reference to that paper to the docstring of that function.

src/connectivity.jl

simonschoelly · 2021-04-12T13:12:20Z

src/connectivity.jl

+    # Invariant 2: if rindex[v] < component_count, then v is in components[rindex[v]].
+    # This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.
+
+    component_root = empty_graph_data(Bool,g)    


Suggested change

component_root = empty_graph_data(Bool,g)

is_component_root = empty_graph_data(Bool,g)

a Dict of type Dict{T, Bool} can usually be replaced by a Set{T}. In this case, you could also think about using a Vector{Bool} or a BitVector} although with these you will waste some space in case we only have a few strongly connected components.

Right, it's using a Vector{Bool} for now whenever T is an Integer. I'm planning to do some benchmarking with bitvector once I have a decent solution to make the loop nonquadratic.

The main issue with using anything that doesn't give you a pointer to a bool is that the boolean gets manipulated in a tight loop which is fast because of branch prediction/speculative execution. But of course adding a variable outside the loop and then storing its result is an option

simonschoelly · 2021-04-12T13:14:24Z

src/connectivity.jl

@@ -252,66 +253,63 @@ function strongly_connected_components end
                v = dfs_stack[end] #end is the most recently added item
                u = zero_t
                @inbounds for v_neighbor in outneighbors(g, v)
-                    if index[v_neighbor] == zero_t
-                        # unvisited neighbor found
+                    if is_unvisited(rindex,v_neighbor)


Suggested change

if is_unvisited(rindex,v_neighbor)

if is_unvisited(rindex, v_neighbor)

in general, would suggest to add a space after a comma in arguments lists for better readability and consistency with the other code.

saolof · 2021-04-12T16:26:39Z

Okay, here's some benchmark results. The first one is the previous library function. The second is what I currently have in this PR. Third one saves the iteration state when vertices become large enough, at the cost of a few extra operations for small vertices. Should I commit the third version since it is still quite competitive but solves the Star graph problem?


testing function strongly_connected_components:
path_digraph(100)
Trial(11.700 μs)
random_regular_digraph(500,5)
Trial(35.200 μs)
 random_tournament_digraph(200)
Trial(88.000 μs)
 random_orientation_dag(complete_graph(100))
Trial(18.500 μs)
 star_digraph(10000)
Trial(31.387 ms)

testing function strongly_connected_components_downward:
path_digraph(100)
Trial(6.360 μs)
random_regular_digraph(500,5)
Trial(32.200 μs)
 random_tournament_digraph(200)
Trial(40.100 μs)
 random_orientation_dag(complete_graph(100))
Trial(14.500 μs)
 star_digraph(10000)
Trial(42.072 ms)

testing function strongly_connected_components_2:
path_digraph(100)
Trial(7.075 μs)
random_regular_digraph(500,5)
Trial(34.600 μs)
 random_tournament_digraph(200)
Trial(38.400 μs)
 random_orientation_dag(complete_graph(100))
Trial(15.600 μs)
 star_digraph(10000)
Trial(778.800 μs)

saolof · 2021-04-12T16:35:53Z

Ah, I hadn't reenabled @inbounds in the third version. Here's some benchmarks with slightly bigger inputs and @inbounds enabled:

testing function strongly_connected_components:
path_digraph(1000)
Trial(110.500 μs)
random_regular_digraph(5000,8)
Trial(537.400 μs)
 random_tournament_digraph(1000)
Trial(2.216 ms)
 random_orientation_dag(complete_graph(1000))
Trial(918.600 μs)
 star_digraph(10000)
Trial(30.834 ms)

testing function strongly_connected_components_downward:
path_digraph(1000)
Trial(61.500 μs)
random_regular_digraph(5000,8)
Trial(517.800 μs)
 random_tournament_digraph(1000)
Trial(815.900 μs)
 random_orientation_dag(complete_graph(1000))
Trial(894.300 μs)
 star_digraph(10000)
Trial(33.027 ms)

testing function strongly_connected_components_2:
path_digraph(1000)
Trial(65.500 μs)
random_regular_digraph(5000,8)
Trial(532.900 μs)
 random_tournament_digraph(1000)
Trial(834.200 μs)
 random_orientation_dag(complete_graph(1000))
Trial(909.300 μs)
 star_digraph(10000)
Trial(756.500 μs)

Fixed O(|E|^2) performance bug that used to be an issue for star graphs. Minimal change in performance for large random graphs, but significant speedup for graphs that have both lots of SCCs and high node orders.

saolof

Committed the version that fixes star graph performance. The algorithm should now be provably O(|V| + |E|) for all graphs. Also included the changes raised during the previous review, and added a general description of the algorithm at the top.

Co-authored-by: Simon Schoelly <[email protected]>

Previous commit removed the if in front of iszero somehow

saolof

Previous commit replacing the u== zero_t with iszero(u) removed the if somehow

Slightly simplified logic and removed the need for zero_t.

saolof

Slightly simplified the logic and removed the need for zero_t.

Trying to figure out what broke. Can elements of outneighbours be equal to nothing?

saolof

Debugging latest commit.

testing

saolof

testing

Set correct name on method

saolof · 2021-04-13T09:24:37Z

Okay, seems fixed. The only non-passing test in the last run is the diffusion simulation at https://github.com/JuliaGraphs/LightGraphs.jl/blob/master/test/traversals/diffusion.jl#L155 which seems to give false negatives occasionally according to comment.

Added comments.

saolof · 2021-04-13T10:43:17Z

src/connectivity.jl

+
+# Required to prevent quadratic time in star graphs without causing type instability. Returns the type of the state object returned by iterate that may be saved to a stack.
+neighbor_iter_statetype(::Type{AG}) where {AG <: AbstractGraph} = Any   # Analogous to eltype, but for the state of the iterator rather than the elements.
+neighbor_iter_statetype(::Type{AG}) where {AG <: LightGraphs.SimpleGraphs.AbstractSimpleGraph} = Int # Since outneighbours is an array.


This may be done more cleanly by making this a @traitfn and adding a trait like say RandomAccessNeighbours to denote that the outnodes can be accessed randomly and that state when iterate()-ing over them is an int.

We could also add a layer of dispatch to strongly connected components so we can feed the graph to the function that infers the state type, and let it attempt to iterate on the first node to get the correct type.

Added a dispatch to infer the types.

saolof

Made the fallback more generic by using Base.Iterators.approx_iter_type().

saolof · 2021-04-14T07:20:42Z

https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da
Updated Gist with benchmarks. Here's some quick results for the latest version in the current commit vs the once currently in master, after tuning the large node cutoff parameter to ~1024 after a first round of benchmarks:

testing function strongly_connected_components:
path_digraph(10000)
  4.615 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  33.385 ms (5983 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  38.742 ms (99 allocations: 4.20 MiB)
random_regular_digraph(50000,200)
  92.663 ms (60 allocations: 4.19 MiB)
random_regular_digraph(50000,2000)
  515.990 ms (60 allocations: 4.19 MiB)
 random_tournament_digraph(10000)
  232.991 ms (51 allocations: 1014.59 KiB)
 random_orientation_dag(complete_graph(10000))
  69.689 ms (20031 allocations: 1.71 MiB)
 star_digraph(100000)
  3.515 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.926 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  36.592 ms (3022 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  41.791 ms (77 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  78.061 ms (56 allocations: 3.43 MiB)
random_regular_digraph(50000,2000)
  207.933 ms (71 allocations: 5.43 MiB)
 random_tournament_digraph(10000)
  75.131 ms (61 allocations: 1.34 MiB)
 random_orientation_dag(complete_graph(10000))
  75.588 ms (10028 allocations: 1.25 MiB)
 star_digraph(100000)
  13.889 ms (100026 allocations: 12.01 MiB)

In general, the speedup seems to be particularly big whenever the weak connectivity of the digraph is large (i.e. depending on width vs height of the DFS forest).

saolof · 2021-04-14T11:48:27Z

Some more benchmarks with more test cases (updated gist with them):

testing function strongly_connected_components:
path_digraph(10000)
  3.778 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  29.751 ms (6063 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  34.748 ms (84 allocations: 4.20 MiB)
random_regular_digraph(50000,200)     
  88.052 ms (60 allocations: 4.19 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  46.974 ms (100034 allocations: 8.30 MiB)
random_regular_digraph(50000,2000)
  516.017 ms (60 allocations: 4.19 MiB)
random_tournament_digraph(10000)
  239.401 ms (51 allocations: 1014.59 KiB)
random_orientation_dag(complete_graph(10000))
  69.948 ms (20031 allocations: 1.71 MiB)
star_digraph(100000)
  3.585 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.274 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  32.453 ms (3062 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  37.415 ms (69 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  74.465 ms (56 allocations: 3.43 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  41.914 ms (50027 allocations: 6.01 MiB)
random_regular_digraph(50000,2000)
  208.503 ms (71 allocations: 5.43 MiB)
random_tournament_digraph(10000)
  71.482 ms (61 allocations: 1.34 MiB)
random_orientation_dag(complete_graph(10000))
  71.993 ms (10028 allocations: 1.25 MiB)
star_digraph(100000)
  11.344 ms (100026 allocations: 12.01 MiB)

Changed everything to be T-valued. Partly to save space for small graphs represented using smaller types, and partly for correctness on machines where Int = Int32 and where the graph is large enough to require Int64s

sbromberger · 2021-05-13T11:39:09Z

I am very ok with this as long as it passes @simonschoelly 's muster. Thank you.

simonschoelly · 2021-05-17T22:17:44Z

I stopped reviewing this PR, as you where pushing quite a lot of commits, so I wanted to wait until you get that done. I will try to continue the review then this week.

sbromberger · 2021-06-04T11:16:30Z

Hi all,

Thanks for the PR and for the comprehensive review! I'm tracking - let me know when it's ready for a final review and merge.

sbromberger · 2021-07-06T12:19:15Z

Hi all,

Just wanted to ping on this. Is it ready for final review?

saolof · 2021-07-06T13:19:51Z

Yes.

Any further additions I may want to suggest (possibly including making it stable by doing the output in the exterior loop instead of the interior one, or adding some flag to return the rootindex array) would be in a separate PR. This is just a performance improvement that strictly avoids making any user-visible changes other than improving performance in the expensive cases, while enabling later PR's by ensuring that the data structures used internally are also useful to return.

saolof added 2 commits April 11, 2021 19:45

Update connectivity.jl

1e9dd73

spelled function name correctly

saolof commented Apr 12, 2021

View reviewed changes

Update connectivity.jl

eaf1946

Counting downwards instead of upwards has the advantage that rindex becomes a lookup table for components, if we ever decide to return both. Also makes the algorithm invariant crystal clear.

saolof commented Apr 12, 2021

View reviewed changes

simonschoelly reviewed Apr 12, 2021

View reviewed changes

src/connectivity.jl Show resolved Hide resolved

simonschoelly reviewed Apr 12, 2021

View reviewed changes

src/connectivity.jl Outdated Show resolved Hide resolved

simonschoelly reviewed Apr 12, 2021

View reviewed changes

Update connectivity.jl

9308b9e

Fixed O(|E|^2) performance bug that used to be an issue for star graphs. Minimal change in performance for large random graphs, but significant speedup for graphs that have both lots of SCCs and high node orders.

saolof commented Apr 12, 2021

View reviewed changes

saolof and others added 2 commits April 12, 2021 17:09

Update src/connectivity.jl

705b925

Co-authored-by: Simon Schoelly <[email protected]>

Update connectivity.jl

3427221

Previous commit removed the if in front of iszero somehow

saolof commented Apr 12, 2021

View reviewed changes

Update connectivity.jl

156793a

Slightly simplified logic and removed the need for zero_t.