Skip to content

Conversation

@martinfrancois
Copy link

I fixed a concurrency bug in NamespacedHierarchicalStore.computeIfAbsent where the defaultCreator function was executed while holding the store's internal map lock. Under parallel execution, this could cause threads using the store to block each other and temporarily see a missing or incorrectly initialized state for values created via computeIfAbsent.

Concretely, the changes are:

  • I reworked NamespacedHierarchicalStore.computeIfAbsent so that:

    • It wraps defaultCreator in a MemoizingSupplier and installs that supplier via the map operation.
    • The supplier is only evaluated after the ConcurrentHashMap update has completed, so user code is no longer run while holding the internal map lock.
    • It uses the previously observed StoredValue (storedValue) and the current mapping in the store (oldStoredValue) to decide when it is safe to install a new value:
      • If there is no mapping in the current store (oldStoredValue == null), it installs a new value.
      • If the current mapping is precisely the one previously observed (oldStoredValue == storedValue, by reference equality), it overwrites that mapping (for example, when it is evaluated to null).
      • Otherwise, it keeps the existing mapping because another thread has already installed a different value.
    • If evaluating the new value fails, it removes the newly installed StoredValue from the map so that subsequent calls can retry.
    • This aligns computeIfAbsent with the behavior of the deprecated getOrComputeIfAbsent, preserving the intended "one initialization per key" semantics while avoiding execution of user code inside the map's critical section.
  • I introduced a CollidingKey helper type in the tests to deterministically force different keys into the same ConcurrentHashMap bucket.

  • I added a regression test computeIfAbsentDoesNotDeadlockWithCollidingKeys in NamespacedHierarchicalStoreTests:

    • It starts two threads that call computeIfAbsent for two different CollidingKey instances in the same namespace.
    • It uses latches so that the first thread's defaultCreator waits for the second, while the second only proceeds after the first has started.
    • It asserts that the first thread does not time out, ensuring that computeIfAbsent does not block subsequent computations on colliding keys.
  • I added an equivalent test getOrComputeIfAbsentDoesNotDeadlockWithCollidingKeys for the deprecated getOrComputeIfAbsent:

    • It mirrors the same scenario with CollidingKey and latches.
    • It demonstrates that getOrComputeIfAbsent does not exhibit this deadlock pattern, because it already evaluates defaultCreator outside the critical section.
    • It documents the behavioral difference that previously existed and why aligning computeIfAbsent with getOrComputeIfAbsent is the correct fix.
  • I added computeIfAbsentOverridesParentNullValue to verify parent/child store semantics purely in terms of computeIfAbsent, put, and get:

    • It stores a null value in the parent via parent.put(namespace, key, null).
    • It verifies that the child initially sees null via child.get(namespace, key).
    • It then calls child.computeIfAbsent(namespace, key, __ -> "value") and asserts that the call returns "value" and that subsequent child.get(namespace, key) also returns "value".
    • This ensures that a null value in the parent is treated as "logically absent" for computeIfAbsent and does not prevent the child from installing its own non-null value.
  • I verified the behavior of the new tests against both the old and new implementations:

    • With the original computeIfAbsent implementation, computeIfAbsentDoesNotDeadlockWithCollidingKeys fails, while getOrComputeIfAbsentDoesNotDeadlockWithCollidingKeys and the existing tests (including simulateRaceConditionInComputeIfAbsent) pass.
    • With the updated computeIfAbsent implementation, all tests pass.
    • This shows that the new tests specifically capture the concurrency issue in computeIfAbsent and that the fix restores the expected behavior without changing getOrComputeIfAbsent.
    • simulateRaceConditionInComputeIfAbsent did not catch this issue because it only exercises contention on a single key and relies on ConcurrentHashMap's per-key atomicity; it does not force different keys into the same bucket or run user code in a way that re-enters the store while the map lock is held, so the problematic interaction never occurs in that test
  • This change should also fix the flakiness observed in AssertJ's SoftAssertionsExtension_PER_CLASS_Concurrency_Test (SoftAssertionsExtension_PER_CLASS_Concurrency_Test flaky test assertj/assertj#1996). In that scenario, two tests run in parallel, and SoftAssertionsExtension uses the JUnit ExtensionContext store (backed by NamespacedHierarchicalStore) to obtain per-test AssertionErrorCollector instances via computeIfAbsent. With the old implementation, concurrent initialization under the map lock could block or race so that collectors were sometimes never registered, and the engine-level statistics occasionally reported started(0) / failed(0) instead of started(2) / failed(2). With the updated computeIfAbsent, the store computes each collector outside the lock and only once per key, so the collectors are always stored and the test statistics are stable even when the tests execute in parallel.

Fixes #5171


I hereby agree to the terms of the JUnit Contributor License Agreement.


Definition of Done

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix. Precisely executed and documented without any flaw, thanks a lot.

+1

@testlens-app
Copy link

testlens-app bot commented Dec 8, 2025

✅ All tests passed ✅

🏷️ Commit: e7357f7
▶️ Tests: 62304 executed
⚪️ Checks: 15/15 completed


Learn more about TestLens at testlens.app.

@martinfrancois
Copy link
Author

martinfrancois commented Dec 8, 2025

You're welcome @Pankraz76, thanks as well for the praise and the review, I really appreciate it! :)

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 feat. complete.

Well done, thanks again for dedication leading to incrementation.

Now its just about polish, giving optional potential dedication - striving for excellence.

But also this is danger land, might better to extract this into clean PR afterwards. Scout principle is nice, still tend ppl. to tilt on this, likely to be overwhelmed.

@SuppressWarnings("ReferenceEquality")
@API(status = MAINTAINED, since = "6.0")
public <K, V> Object computeIfAbsent(N namespace, K key, Function<? super K, ? extends V> defaultCreator) {
Preconditions.notNull(defaultCreator, "defaultCreator must not be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Preconditions.notNull(defaultCreator, "defaultCreator must not be null");
notNull(defaultCreator, MANDATORY_DEFAULT_CREATOR);

imho, useless coupling, context and noisy burden.

Copy link
Author

@martinfrancois martinfrancois Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do this (Preconditions.notNull => notNull) too, but since it was like this in the code already I wasn't sure if it was a convention here, as I imagine it could potentially be confused with requireNonNull or it could be intentional to make it clear it's a precondition. Could a maintainer please give a second opinion here? I'd be glad to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use Preconditions.notNull.

rejectIfClosed();
return computedValue;
});
return Preconditions.notNull(defaultCreator.apply(key), "defaultCreator must not return null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Preconditions.notNull(defaultCreator.apply(key), "defaultCreator must not return null");
return notNull(defaultCreator.apply(key), MANDATORY_DEFAULT_CREATOR_VALUE);

@API(status = MAINTAINED, since = "1.13.3")
public final class NamespacedHierarchicalStore<N> implements AutoCloseable {

private final AtomicInteger insertOrderSequence = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final String MANDATORY_DEFAULT_CREATOR_VALUE = "defaultCreator must not return `null`";
private static final String MANDATORY_DEFAULT_CREATOR = "defaultCreator must not be `null`";
private final AtomicInteger insertOrderSequence = new AtomicInteger();

Copy link
Author

@martinfrancois martinfrancois Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there aren't that many constants in the code for repeating strings like this I'm not sure this is on purpose - I'd agree with that change, can a maintainer please give a second opinion here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't extract constants for pre-condition messages. It creates a layer of indirection that makes the code harder to understand. And I can't give you chapter and verse, but Java will de duplicate constant expressions.

Also note that we don't put backticks around null.

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2cts.

going fully functional separating the concerns (SoC/SRP).

`computeIfAbsent` previously invoked `defaultCreator` while holding the
store's internal map lock. Under parallel execution this could cause
threads using the store to block each other and temporarily see missing
or incorrectly initialized state for values created via
`computeIfAbsent`.

The implementation now wraps `defaultCreator` in a `MemoizingSupplier`
and installs that supplier via the map operation, evaluating it only
after the update has completed. This avoids running user code while
holding the lock and aligns `computeIfAbsent`'s behavior with the
deprecated `getOrComputeIfAbsent`, preserving the intended "one
initialization per key" semantics.

Issue: junit-team#5171
Signed-off-by: martinfrancois <[email protected]>
@martinfrancois
Copy link
Author

You’re welcome, and thanks again, @Pankraz76, for taking another careful look.
Your suggestions are great, but as you pointed out, they mostly target existing code that could be cleaned up independently. I’d like to keep this PR focused on the bugfix so we do not drag out the review with additional refactoring.
Once this is merged, I am happy to follow up with a separate cleanup PR. I do not want to delay AssertJ being able to update to JUnit 6 again any longer than necessary 🙂

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a quick read through and left some comments to resolve open questions, but this is not a full review yet.

It seems that you've found and solved a different problem than described in #5171. And you claim the solutions overlap.

Unfortunately the original problem is quite tricky and the description for this pull request incredibly verbose. We'll need some time to go through the details. You can help us process this by writing a much more concise PR description.

@API(status = MAINTAINED, since = "1.13.3")
public final class NamespacedHierarchicalStore<N> implements AutoCloseable {

private final AtomicInteger insertOrderSequence = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't extract constants for pre-condition messages. It creates a layer of indirection that makes the code harder to understand. And I can't give you chapter and verse, but Java will de duplicate constant expressions.

Also note that we don't put backticks around null.

@SuppressWarnings("ReferenceEquality")
@API(status = MAINTAINED, since = "6.0")
public <K, V> Object computeIfAbsent(N namespace, K key, Function<? super K, ? extends V> defaultCreator) {
Preconditions.notNull(defaultCreator, "defaultCreator must not be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use Preconditions.notNull.

block each other and temporarily see a missing or incorrectly initialized state
for values created via `computeIfAbsent`. The method now evaluates
`defaultCreator` outside the critical section using a memoizing supplier,
aligning its behavior with the deprecated `getOrComputeIfAbsent`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write this down more concisely? The release notes generally focus on a top-line understanding of what was fixed. You could express this as having solved the symptoms of #5209 rather than its root cause.

You could for clarity also add a second item that describes how computeIfAbsent no longer deadlocks.

}

@Test
void computeIfAbsentCanDeadlockWithCollidingKeys() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this test suggests that computeIfAbsent can currently deadlock. But I assume that after your fix this is no longer the case?

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've already found one reason this will not work as expected. See comment below.

simulateRaceConditionInComputeIfAbsent did not catch this issue because it only exercises contention on a single key and relies on ConcurrentHashMap's per-key atomicity; it does not force different keys into the same bucket or run user code in a way that re-enters the store while the map lock is held, so the problematic interaction never occurs in that test

With this in mind, I would have expected to see a test like simulateRaceConditionInComputeIfAbsent that forces keys into the same bucket.

return requireNonNull(newStoredValue.evaluate());
}
catch (Throwable t) {
storedValues.remove(compositeKey, newStoredValue);
Copy link
Contributor

@mpkorstanje mpkorstanje Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a period of time between storedValues.compute() and storedValues.remove() where a different thread via getStoredValue() can briefly access the newStoredValue and encounter its stored exception. As such the stores operations are not atomic.

And I think this invalidates any approach that tries to avoid execution of the defaultCreator outside the compute method.

var result = StoredValue.evaluateIfNotNull(storedValue);
if (result == null) {
StoredValue newStoredValue = this.storedValues.compute(compositeKey, (__, oldStoredValue) -> {
if (StoredValue.evaluateIfNotNull(oldStoredValue) == null) {
Copy link
Contributor

@mpkorstanje mpkorstanje Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your analysis you said:

In NamespacedHierarchicalStore#computeIfAbsent, the implementation previously relied on ConcurrentMap.computeIfAbsent, which provides the "one logical initialization per key" behavior. After the change to storedValues.compute(…), every call to NamespacedHierarchicalStore.computeIfAbsent for the same key can rerun the initialization logic and replace the existing StoredValue.

That means that even though each compute call is atomic, two threads calling NamespacedHierarchicalStore.computeIfAbsent for the same key can:

  1. Have Thread A initialize the stored value and start tracking statistics.
  2. Then have Thread B rerun the initialization and replace that value, effectively resetting the statistics.

But looking at the existing implementation, the defaultCreator is not applied until after the oldStoredValue has been checked. So when defaultCreator is applied for a given key a value was either not set at all or that value was set and set not null. So on the face of it the defaultCreator should be applied at most once and point 2 shouldn't happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrency problem in NamespacedHierarchicalStore#computeIfAbsent

3 participants