Skip to content

Conversation

@MathisHammel
Copy link
Contributor

Following up on #5260, it turns out our proposed implementation of prevent_overlapping is missing a small detail that degrades the quality of the result.

Indeed, the notes in the ForceAtlas2 paper mention "we also implemented a diminishing factor on the local speed (dividing it by 10)" when the feature is enabled, which is not present in our previous PR. Without this factor, overlapping nodes tend to be ejected very fast and the algorithm does not converge properly.

The implementation of this PR matches the corresponding function in Gephi, slightly different than explained in the paper.

The render below was performed with cuGraph before and after applying the modification, which shows a significant improvement in the quality of overlap prevention:

image

@MathisHammel MathisHammel requested a review from a team as a code owner October 2, 2025 12:13
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@BradReesWork
Copy link
Member

@MathisHammel Thanks for the PR and the great qork.

I need to update the PR to point to branch-25.12

@BradReesWork BradReesWork changed the base branch from branch-25.10 to branch-25.12 October 2, 2025 17:42
@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 2, 2025
@rapidsai rapidsai deleted a comment from copy-pr-bot bot Oct 2, 2025
@BradReesWork
Copy link
Member

/okay to test 526de3c

Copy link
Contributor

@eriknw eriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @MathisHammel!

@ChuckHastings
Copy link
Collaborator

/ok to test bf37157

@ChuckHastings
Copy link
Collaborator

@MathisHammel - Looks like your PR is failing one of the python tests. The AI analysis of your failure:

The failure occurred in test_force_atlas2_noverlap[500-graph_file3-10.0-1100] with the assertion assert 4041 < 1100. This means the test expected fewer than 1100 overlapping vertex pairs, but 4041 were found.
 * The test in python/cugraph/cugraph/tests/layout/test_force_atlas2.py expects that, after running Force Atlas 2 with prevent_overlapping enabled, the number of overlapping vertex pairs should be "very low" (see the threshold max_overlaps).
 * For the netscience graph (graph_file3), the test failed because the overlap count was much higher than expected.
 * This could be due to changes in the layout algorithm, differences in graph structure, or insufficiently strict parameters.

I suspect your change results in a different expectation from the code.

@MathisHammel
Copy link
Contributor Author

Yup, I was investigating that just now. The thresholds for this family of tests were determined from 500 runs of the algorithm in its previous (incorrect) version, I'll need to recompute parameters properly because I suspect the 0.1x speed factor requires more iterations to converge.

I can do that after next week, feel free to set this PR as draft/closed in the meantime

@ChuckHastings
Copy link
Collaborator

I can do that after next week, feel free to set this PR as draft/closed in the meantime

No need. The PR doesn't consume any CI resources unless we manually trigger things to rerun. We'll know it needs work because discussed it and it's logged here in the PR.

@ChuckHastings
Copy link
Collaborator

ChuckHastings commented Oct 7, 2025

/ok to test d1e7df7

1 similar comment
@ChuckHastings
Copy link
Collaborator

/ok to test d1e7df7

@MathisHammel MathisHammel requested a review from a team as a code owner October 20, 2025 16:29
@MathisHammel
Copy link
Contributor Author

MathisHammel commented Oct 20, 2025

@ChuckHastings Tests should now be fixed.

Adding the speed factor effectively reduced the speed of nodes by a factor of 10, which prevented FA2 from converging properly in the 500 iterations that were initially allocated. For the prevent_overlapping tests, I changed the number of iterations to 5000 to compensate for this.

Testcases are calibrated to meet the following criteria:

  • If prevent_overlapping is working as intended, the number of overlaps should be lower than the threshold >99% of the time
  • If it's broken (simulated here by disabling the feature completely), it should fail >99% of the time

I ran each test 100 times (was hoping to do 1000 but we only have a single RTX 2060 for now 😅) with and without prevent_overlapping, and a clear separation appears in the number of overlaps between working vs. not working. Given the low variance in each case, I'm pretty sure we could run each test a million times and never have a false positive or negative.

I also took this opportunity to add tests with barnes_hut_optimize, because the computation of prevent_overlapping will be slightly different when it's activated or not. The results are pretty much the same with or without BH, so I used the same threshold for both.

prevent_overlapping enabled? ✅ (BH on) ✅ (BH off) ❌ (BH on) ❌ (BH off) Chosen threshold
Karate 40-56 0-21 143-147 144-147 100
Polbooks 116-142 40-87 274-312 273-311 200
Dolphins 52-58 6-42 182-202 182-202 110
Netscience 38-147* 68-300 1105-1136 1106-1133 700
Dining prefs 19-37 0-15 144-157 144-160 80

* for some reason, Barnes-Hut approximation actually gives a slightly better result here. I didn't investigate, but I believe exact computations on a tight layout won't converge as fast due to the strong repulsion force.

@MathisHammel
Copy link
Contributor Author

While working on this, I also noticed that one important test was accidentally disabled by #5260: all tests regarding the callback were commented out (which is expected since the argument got removed), but this part about trustworthiness is still relevant to verify FA2 is working.

I couldn't fix it easily because of other changes in vertex renumbering, I think the issue is beyond the scope of this PR so I'll let you decide if an issue should be created or perhaps a simple FIXME comment will do it for now.

@ChuckHastings
Copy link
Collaborator

/ok to test 63e7d38

@MathisHammel
Copy link
Contributor Author

Forgot to run pre-commit hooks, I just pushed a new commit to fix linting

@ChuckHastings
Copy link
Collaborator

/ok to test f3545d7

@MathisHammel
Copy link
Contributor Author

Looks like the AWS outage isn't quite over yet, seeing a lot of failing requests in the build: pip._vendor.requests.exceptions.RetryError: HTTPSConnectionPool(host='pypi.anaconda.org', port=443): Max retries exceeded with url: /rapidsai-wheels-nightly/simple/rapids-logger/0.2.3/rapids_logger-0.2.3-py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (Caused by ResponseError('too many 503 error responses'))

@rlratzel
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 8e1b9fb into rapidsai:main Oct 21, 2025
125 of 132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants