-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in JenaSystem.init() ? #2787
Comments
Anecdotally I've definitely noticed similar issues with upgrading to 5.2.0, generally this is fixed by adding an explicit JenaSystem.init() call into each test class/entry point to control when initialisation happens |
@afs I'll see what I can do to make a reproducible example. I think it would have to be just a few threads spamming JenaSystem.init() until it jams... I'm already running JenaSystem.init() in beforeAll of each test case, but I think scalatest runs these in parallel. I'll see if I can run something before any test case is executed. |
Ok, got it. @afs This reliably gets stuck every time I run it, with a very similar thread dump to what I posted above: https://github.com/Ostrzyciel/jena-default-graphs/blob/issue-2787/jena-default-graphs/src/main/java/org/example/Main.java I'm not sure if this qualifies as a "bug", but it's surely pretty confusing to me. |
P.S.: the reproduction code does not get stuck with Jena 5.1.0. Only with Jena 5.2.0. |
See this issue: apache/jena#2787 Regardless of whether this behavior of Jena is a bug or not, this PR should fix it. It simply calls JenaSystem.init() in a static block before anything related to Jena is done. Amen.
See this issue: apache/jena#2787 Regardless of whether this behavior of Jena is a bug or not, this PR should fix it. It simply calls JenaSystem.init() in a static block before anything related to Jena is done. Amen.
We are encountering a chicken-and-egg problem with the The However, during the This circular dependency between Unfortunately, I don’t yet have a clue how to fix it. |
The code used before Jena 5.2 allowed only the first thread to start the initialization, but all subsequent threads did not wait because the With Jena 5.2, we can now be sure that everything has been properly initialized. |
Test can always problematic because they mayt jump into the middle of the system. |
Java synchronized blocks in Java are reentrant. So it's two parallel threads, not a same-thread deadly embrace. |
Thanks for the example - it can dealock here as well, and with less threads.
|
In an attempt to make the possible blockage and cause visible to the user, I may have inadvertently fixed it: private static volatile boolean initialized = false ;
private static Semaphore initializationSemaphore = new Semaphore(1);
private static ThreadLocal<Boolean> isInitializing = ThreadLocal.withInitial(() -> false);
public static void init() {
// Once jena is initialized, all calls are an immediate return.
if ( initialized )
return ;
// if the same thread is calling init(), while the initialization is in progress, return immediately
if(isInitializing.get())
return ;
isInitializing.set(true); // set the flag to true to indicate that the initialization is in progress
try {
// Only one thread can initialize Jena.
if(initializationSemaphore.tryAcquire(5, TimeUnit.SECONDS)) {
if ( initialized )
return ;
setup();
if ( DEBUG_INIT )
singleton.debug(DEBUG_INIT);
singleton.initialize();
singleton.debug(false);
// Last so overlapping initialization waits on the synchronized
initialized = true;
}
} catch (InterruptedException e) {
throw new RuntimeException("Timeout while waiting for semaphore to initialize JenaSystem. Please call JenaSystem.init() before working with multiple threads.", e);
} finally {
initializationSemaphore.release();
isInitializing.remove();
}
} |
@arne-bdt Throw up a PR for us to review then |
Hmpf... I've started a new branch, but it's not working the way I saw it before... :-( |
One thing to consider (as well) is to remove the hidden model in all vocabularies and change schemagen. A collection of constants don't need to be created off of a model. Even if want keep the hidden model it can be a minimal one, not This shouldn't require external vocabularies to make the same change. |
- printing an error to System.err, with a detailed message -> unable to fix the deadlock itself, that is all I came up with - added integration tests for JenaSystem.init(), also for the parallel case.
- printing an error to System.err, with a detailed message -> unable to fix the deadlock itself, that is all I came up with - added integration tests for JenaSystem.init(), also for the parallel case.
It is not only the ModelFactory. This is the complete list of classes, where
|
yes - specifically, it is the class initialization lock being held by ModelFactory and any locking in JenaSystem. With two threads, the deadlock is: Thread1 is the thread calling ModelFactory.createDefaultModel Thread1: lock order is take the ModelFactory class init lock, then take the JenaSystem sync lock. Prior to the 8654c07 change, thread2 would not take the ModelFactory class init lock inside synchonized but continued executing on partial initialization which works mostly .. except when it doesn't and it causes an NPE #2675. It's a problem when using Jena as a library because of the multiple entry points doing class initlaization. FWIW JenaSystem.init came in part because of class initialization circularity seeing infrequent uninitialized data even on one thread due to different class initialization orders. |
- Added a fail-fast mechanism: now throwing an `ExceptionInInitializerError` when a thread attempts to enter `JenaSystem.init()` while another thread is in the initialization process. - Refactored `JenaSystem.init()` using the "Initialization-on-demand holder idiom" for simpler and safer initialization. - Implemented integration tests for `JenaSystem.init()`, including parallel execution cases to ensure thread safety.
Due to the complexity of the initialization process and the presence of circular dependencies, resolving the deadlock issue is not straightforward. If we can't clean up the initialization process and eliminate these circular dependencies, we're essentially left with two problematic variants:
For the Deadlock variant, I implemented a fail-fast approach that throws an informative exception. (See update in PR #2792.) |
I think the solution is to go back to #2675 and look to fix the class-initializer problem. That probably needs investigating anyway. Then there is no occurence of possible alternative orders of taking locks which is the deadlock. The #2675 / 8654c07 is a non-concurrency ordering problem in a single thread - i.e. a cyclic class initialization. Requiring usage to only have single threaded initialization could be quite a problem in environments that have multi-threading and then use Jena as a library. It is not always so easy to have application code in the single-threaded startup phase. |
Basically every usage, that would trigger an exeption with PR #2792, will end up in a deadlock with Jena 5.2. What do we do with Jena 5.2 then? Do we withdraw the publication or do we issue a clear warning? |
Yes, but the usage is unusual (scalatest). We have workaround for users that need 5.2 and can put code in the single threaded stage and it is something that's been encouraged for quite awhile. If fixing the single-threaded #2675 another way can be done, we do not need to have users' change. If, in investigating #2675, we discover it can not be done, then we may have to require user change but that's a big step. The practical difficulty is that any change is one we can't test completely because there is too large a variety of uses. (And also that people don't test pre-releases.)
ASF, and open source generally, doesn't remove releases. We can put out a new version. I am sorry that PR 8654c07 broke some usages. |
Tentatively, GH-2675 seems to be fixed by just adding the RDFLinkHTTP without the JenaSystem.init change moving That is not full confirmation of a fix yet but it does change the behaviour of the thread undertaking the initialization because it no longer calls makes a nested call to Addendum: that leaves the possibility of a thread seeing "initialized = true" and skipping the synchronized block, proceeding without initialization having completed. None of the reports have been about this situation and it may be that it is hidden (initialized = true not having been written back even though volatile until the exit from synchronized which is a memory happens-before point) or impossible because of class initialization caused by |
- reinstated early return from JenaSystem.init() from before apache@8654c07 - simplified init using "initialization-on-demand holder idiom" - added integration tests for JenaSystem.init(), also for the parallel case.
I thought that was the important part, that needed fixing by moving the --> So I updated my branch by moving the
Sounds great. I also tried to reproduce such a situation with the code that reproduces the deadlock. I could not convince a static field from the vocabulary to return Unfortunately I have no idea on how to test if issue #2675 still works, the PR #2677 does not seem to contain a new dedicated test. |
It is difficult to test for because Jena is likely initlialized and any test will pass for that reason -- it isn't a test of class initialization order and we can not rely on the test. If this is run from public static void mainNPE2675() {
System.out.println("GH-2675: RDFLinkHTTP NPE");
try (RDFConnection conn = RDFConnectionFuseki.service("http://localhost:3030/ds").build()) {
}
System.out.println("DONE");
} |
Great: It runs perfectly. But when |
In my branch main...arne-bdt:jena:GH-2787-Deadlock_in_Jena_System_init, I observerd something strage: But instead, this change reliably takes me straight back to the reproducible deadlock situation. So I reverted that experment. |
Is the |
The When running However, I have not yet figured out how to integrate this test to run it with Maven in a separate JVM instance to ensure that the For more information on the pattern, you can refer to the documentation here and here. |
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
Update: Using the JMH framework, it was easy to implement a test that always starts with a fresh JVM. On that basis, I could run many iterations and stress it even more with threads. |
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`. - Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
(Because this is critical in Jena, forgive me for wanting to get into the details.) I don't quite understand "more robust" here. The second thread is allowed to continue once initialized is set true. LazyHolder is Efficiency - in our case, there isn't much different because the volatile is still there. |
@afs This pattern is a "cleaner" approach to static initialization, as it delegates synchronization responsibility to the JVM. It also guarantees the code executes only once. To replicate what the JVM does for us, we currently use synchronized(JenaSystem.class) {
if ( initialized )
return ;
...
} I understand that introducing this pattern may have caused some confusion, as it does not directly contribute to solving the problem. The actual fix was simply moving Since I prefer to clean up code whenever I need to modify it, I introduced the pattern. We are early in the Jena 5.3 cycle, and we have code to reproduce the threading issues along with solid tests. I ran numerous test iterations that demonstrated stable behavior. |
I have my experience with threading and I like challenges. So I jumped on this bug, which could also be relevant for some of my work projects. But now I am at a loss. Restoring the early return from JenaSystem.init() as implemented prior to commit 8654c07 does not solve the deadlock bug. It only reduces the time-window and thus the likelihood of an occurrence.
|
Class initlializtion spec: https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.4.2 The |
Following @arne-bdt's proposal, PR #2807 puts in the lazy holder pattern and moves the setting initialization to the start of the process. It passes the deadlock test case at he top of this issue. If we all agree, putting that in unlocks further testing of scalatest in case more issues turn up. The PR is an improvement - testing (in the build and in usage) can follow. |
@afs Thank you all for putting all the effort in solving this! If the code passes the deadlock case, then I'm pretty sure it's alright. I do not fully understand how it works, so I can't comment on the details of the solution. |
- applied asf solution for JenaSystem init - changed JHM-Settings --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT) Originally for apacheGH-2787: Deadlock in `JenaSystem.init()` - Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`. - Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
- applied asf solution for JenaSystem init - changed JHM-Settings --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT) Originally for apacheGH-2787: Deadlock in `JenaSystem.init()` - Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`. - Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
- applied asf solution for JenaSystem init - changed JHM-Settings --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT) Originally for apacheGH-2787: Deadlock in `JenaSystem.init()` - Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07). - Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety. - Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases. - Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner. - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`. - Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
We seem to be done there. #2809 may happen but the root cause of this issue is fixed. Let's close, and re-open if there is new information. |
Version
5.2.0
What happened?
After upgrading to Jena 5.2.0, I've started noticing that some of my CI test runs were randomly just getting stuck. After a debugging session with sbt on my machine, I think this might be a deadlock in the initialization code. See the attached thread dump.
The first thread in the dump is within JenaSystem.init(), so it's doing something to initialize Jena. It's stuck on this line:
jena/jena-core/src/main/java/org/apache/jena/vocabulary/DC_11.java
Line 29 in 6443abd
To run this line, JVM has to run the static initializer of ModelFactory, which includes a call to JenaSystem.init():
jena/jena-core/src/main/java/org/apache/jena/rdf/model/ModelFactory.java
Lines 47 to 49 in 6443abd
Oops?
The other thread in the dump is just waiting on the first one.
I'm not sure how the code got in this state, but I'm using sbt and scalatest, which absolutely love making lots of parallel calls to code that maybe shouldn't be called in parallel. Maybe that is the issue, but I had no problems with Jena 5.1.0.
I think this issue may be related to this change: 8654c07
Relevant output and stacktrace
Are you interested in making a pull request?
This seems to be above my pay grade. No idea what's happening here.
The text was updated successfully, but these errors were encountered: