Fix hangs in multithreaded fixpoint iteration #1010

MichaReiser · 2025-10-19T08:28:31Z

This PR fixes three bugs that could lead to hangs when fixpoint cycles spanned multiple threads:

Race between transfer_lock and block_on_heads: Salsa detects outer cycles by claiming a query's lock and seeing if doing so would result in a cycle. For this to work, it's crucial that a thread participating in a cycle never changes from "blocked" to running unless it's the thread currently driving the cycle forward. However, this could happen before this fix because the thread of an inner cycle kept running after completing transfer_lock (to the outer query) until it blocked on one of its cycle heads in provisional_retry (block_on_heads). This PR fixes this race by removing provisional_retry and instead blocking the thread of the inner cycle inside transfer_lock. This also results in a nice perf improvement because we no longer need to iterate over cycle heads.
Before this PR, TryClaimCycleHeadsIter first tested if the head is on the current thread's query stack, and if so, returned the iteration count from the query stack. However, the iteration count on the query stack isn't guaranteed to match the iteration count of the latest memo when the query's ownership was transferred to another thread (and this thread was blocked on the other thread). In that case, the other thread has most likely iterated the query multiple times and we have to refetch the latest memo to get the "newest" iteration count. The fix is easy, remove the local query stack "optimisation" from TryClaimCycleHeadsIter.
This was subtle, but thread_id_of_transferred_query skip behavior was slightly different from how transfer_lock rewrites the transferred locks.

Test Plan

Claude helped me write a new test for 1. I wasn't able to come up with test cases that trigger 2 or 3, but I verified that we no longer see the hangs in ty.

same query blocking on the outer head in `provisional_retry`

netlify · 2025-10-19T08:28:37Z

✅ Deploy Preview for salsa-rs canceled.

Name	Link
🔨 Latest commit	`48e53cd`
🔍 Latest deploy log	https://app.netlify.com/projects/salsa-rs/deploys/68f5d4d6fec6c400089c632f

codspeed-hq · 2025-10-19T08:30:19Z

CodSpeed Performance Report

Merging #1010 will degrade performances by 4.3%

_{Comparing MichaReiser:fixpoint-race (48e53cd) with master (9cfe41c)}

Summary

⚡ 1 improvement
❌ 1 (👁 1) regression
✅ 11 untouched

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
👁	`amortized[SupertypeInput]`	2.8 µs	2.9 µs	-4.3%
⚡	`converge_diverge_nested`	120.5 µs	111.6 µs	+8%

Fix race condition between releasing a transferred query's lock and the

97b4535

same query blocking on the outer head in `provisional_retry`

MichaReiser added the bug Something isn't working label Oct 19, 2025

MichaReiser requested review from carljm and ibraheemdev October 19, 2025 08:49

MichaReiser marked this pull request as ready for review October 19, 2025 08:50

MichaReiser force-pushed the fixpoint-race branch from 8c5d831 to e0c8545 Compare October 19, 2025 09:04

Fix infinite loop in `provisional_retry

48e53cd

MichaReiser force-pushed the fixpoint-race branch from e0c8545 to 48e53cd Compare October 20, 2025 06:21

ibraheemdev approved these changes Oct 22, 2025

View reviewed changes

MichaReiser added this pull request to the merge queue Oct 23, 2025

Merged via the queue into salsa-rs:master with commit 16d51d6 Oct 23, 2025
10 of 12 checks passed

MichaReiser deleted the fixpoint-race branch October 23, 2025 06:25

github-actions bot mentioned this pull request Oct 22, 2025

chore: release v0.25.0 #1002

Open

MichaReiser mentioned this pull request Oct 23, 2025

[ty] Fix rare multithreaded related hang astral-sh/ruff#21038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix hangs in multithreaded fixpoint iteration #1010

Fix hangs in multithreaded fixpoint iteration #1010

Uh oh!

MichaReiser commented Oct 19, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix hangs in multithreaded fixpoint iteration #1010

Fix hangs in multithreaded fixpoint iteration #1010

Uh oh!

Conversation

MichaReiser commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

netlify bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for salsa-rs canceled.

Uh oh!

codspeed-hq bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #1010 will degrade performances by 4.3%

Summary

Benchmarks breakdown

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaReiser commented Oct 19, 2025 •

edited

Loading

netlify bot commented Oct 19, 2025 •

edited

Loading

codspeed-hq bot commented Oct 19, 2025 •

edited

Loading