Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in JenaSystem.init() ? #2787

Closed
Ostrzyciel opened this issue Oct 22, 2024 · 35 comments
Closed

Deadlock in JenaSystem.init() ? #2787

Ostrzyciel opened this issue Oct 22, 2024 · 35 comments
Labels

Comments

@Ostrzyciel
Copy link
Contributor

Ostrzyciel commented Oct 22, 2024

Version

5.2.0

What happened?

After upgrading to Jena 5.2.0, I've started noticing that some of my CI test runs were randomly just getting stuck. After a debugging session with sbt on my machine, I think this might be a deadlock in the initialization code. See the attached thread dump.

The first thread in the dump is within JenaSystem.init(), so it's doing something to initialize Jena. It's stuck on this line:

private static final Model m_model = ModelFactory.createDefaultModel();

To run this line, JVM has to run the static initializer of ModelFactory, which includes a call to JenaSystem.init():

public class ModelFactory extends ModelFactoryBase
{
static { JenaSystem.init(); }

Oops?

The other thread in the dump is just waiting on the first one.

I'm not sure how the code got in this state, but I'm using sbt and scalatest, which absolutely love making lots of parallel calls to code that maybe shouldn't be called in parallel. Maybe that is the issue, but I had no problems with Jena 5.1.0.

I think this issue may be related to this change: 8654c07

Relevant output and stacktrace

"pool-18-thread-17@31723" prio=5 tid=0x161 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	 blocks pool-18-thread-5@31711
	 blocks pool-18-thread-8@31714
	 blocks pool-18-thread-9@31715
	  at org.apache.jena.vocabulary.DC_11.<clinit>(DC_11.java:29)
	  at org.apache.jena.shared.PrefixMapping.<clinit>(PrefixMapping.java:230)
	  at org.apache.jena.sparql.ARQConstants.<clinit>(ARQConstants.java:120)
	  at org.apache.jena.query.ARQ.init(ARQ.java:652)
	  - locked <0x7bff> (a java.lang.Object)
	  at org.apache.jena.sparql.system.InitARQ.start(InitARQ.java:29)
	  at org.apache.jena.base.module.Subsystem.lambda$initialize$1(Subsystem.java:117)
	  at org.apache.jena.base.module.Subsystem$$Lambda$12363/0x00007fe8d9c5a6f0.accept(Unknown Source:-1)
	  at java.util.ArrayList.forEach(ArrayList.java:1511)
	  at org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:193)
	  - locked <0x7bf9> (a org.apache.jena.base.module.Subsystem)
	  - locked <0x72e> (a java.lang.Class)
	  at org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:169)
	  at org.apache.jena.base.module.Subsystem.initialize(Subsystem.java:115)
	  at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:90)
	  - locked <0x73c> (a java.lang.Class)
	  at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:107)


"pool-18-thread-8@31714" prio=5 tid=0x158 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 waiting for pool-18-thread-17@31723 to release lock on <0x73c> (a java.lang.Class)
	  at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:85)
	  at org.apache.jena.rdf.model.ModelFactory.<clinit>(ModelFactory.java:49)

Are you interested in making a pull request?

This seems to be above my pay grade. No idea what's happening here.

@Ostrzyciel Ostrzyciel added the bug label Oct 22, 2024
@afs
Copy link
Member

afs commented Oct 22, 2024

Do you have a reproducible - in java! - example?

The change at 8654c07 is for a reason - see GH-2675.
It is supposed to hold up concurrent initialization because that causes an NPE.

For scalatest, it would be better to explicitly call JenaSystem.init before things go parallel.

@rvesse
Copy link
Member

rvesse commented Oct 22, 2024

Anecdotally I've definitely noticed similar issues with upgrading to 5.2.0, generally this is fixed by adding an explicit JenaSystem.init() call into each test class/entry point to control when initialisation happens

@Ostrzyciel
Copy link
Contributor Author

@afs I'll see what I can do to make a reproducible example. I think it would have to be just a few threads spamming JenaSystem.init() until it jams...

I'm already running JenaSystem.init() in beforeAll of each test case, but I think scalatest runs these in parallel. I'll see if I can run something before any test case is executed.

@Ostrzyciel
Copy link
Contributor Author

Ok, got it.

@afs This reliably gets stuck every time I run it, with a very similar thread dump to what I posted above: https://github.com/Ostrzyciel/jena-default-graphs/blob/issue-2787/jena-default-graphs/src/main/java/org/example/Main.java

I'm not sure if this qualifies as a "bug", but it's surely pretty confusing to me.

@Ostrzyciel
Copy link
Contributor Author

P.S.: the reproduction code does not get stuck with Jena 5.1.0. Only with Jena 5.2.0.

Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Oct 22, 2024
See this issue: apache/jena#2787

Regardless of whether this behavior of Jena is a bug or not, this PR should fix it. It simply calls JenaSystem.init() in a static block before anything related to Jena is done. Amen.
Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Oct 22, 2024
See this issue: apache/jena#2787

Regardless of whether this behavior of Jena is a bug or not, this PR should fix it. It simply calls JenaSystem.init() in a static block before anything related to Jena is done. Amen.
@arne-bdt
Copy link
Contributor

Ok, got it.

@afs This reliably gets stuck every time I run it, with a very similar thread dump to what I posted above: https://github.com/Ostrzyciel/jena-default-graphs/blob/issue-2787/jena-default-graphs/src/main/java/org/example/Main.java

I'm not sure if this qualifies as a "bug", but it's surely pretty confusing to me.

We are encountering a chicken-and-egg problem with the ModelFactory and JenaSystem.init().

The ModelFactory has a static constructor call to JenaSystem.init();. This means that ModelFactory assumes the JenaSystem has already been initialized and requires JenaSystem.init() to be called first.
image

However, during the JenaSystem.init() process, some code is executed that calls ModelFactory.createDefaultModel(), which in turn assumes that ModelFactory is already available, even though JenaSystem.init() has not yet completed.
image

This circular dependency between JenaSystem initialization and the ModelFactory is causing the issue.

Unfortunately, I don’t yet have a clue how to fix it.

@arne-bdt
Copy link
Contributor

The code used before Jena 5.2 allowed only the first thread to start the initialization, but all subsequent threads did not wait because the JenaSystem pretended to have already been initialized, even though that might not have been the case yet:

image

With Jena 5.2, we can now be sure that everything has been properly initialized.

@afs
Copy link
Member

afs commented Oct 22, 2024

Anecdotally I've definitely noticed similar issues with upgrading to 5.2.0, generally this is fixed by adding an explicit JenaSystem.init() call into each test class/entry point to control when initialisation happens

Test can always problematic because they mayt jump into the middle of the system.

@afs
Copy link
Member

afs commented Oct 22, 2024

Java synchronized blocks in Java are reentrant.

So it's two parallel threads, not a same-thread deadly embrace.

@afs
Copy link
Member

afs commented Oct 22, 2024

Ok, got it.

@afs This reliably gets stuck every time I run it, with a very similar thread dump to what I posted above: https://github.com/Ostrzyciel/jena-default-graphs/blob/issue-2787/jena-default-graphs/src/main/java/org/example/Main.java

I'm not sure if this qualifies as a "bug", but it's surely pretty confusing to me.

Thanks for the example - it can dealock here as well, and with less threads.

P.S.: the reproduction code does not get stuck with Jena 5.1.0. Only with Jena 5.2.0
It really is unsafe in 5.1.0 because the second thread jumps over initialization - that's why the NPE in #2675 happened.

Jena.System.init needs to be cheap - otherwise it would be easier to make safe!

@arne-bdt
Copy link
Contributor

In an attempt to make the possible blockage and cause visible to the user, I may have inadvertently fixed it:

private static volatile boolean initialized = false ;
private static Semaphore initializationSemaphore = new Semaphore(1);
private static ThreadLocal<Boolean> isInitializing = ThreadLocal.withInitial(() -> false);

public static void init() {
    // Once jena is initialized, all calls are an immediate return.
    if ( initialized )
        return ;

    // if the same thread is calling init(), while the initialization is in progress, return immediately
    if(isInitializing.get())
        return ;

    isInitializing.set(true); // set the flag to true to indicate that the initialization is in progress

    try {
        // Only one thread can initialize Jena.
        if(initializationSemaphore.tryAcquire(5, TimeUnit.SECONDS)) {
            if ( initialized )
                return ;
            setup();
            if ( DEBUG_INIT )
                singleton.debug(DEBUG_INIT);
            singleton.initialize();
            singleton.debug(false);
            // Last so overlapping initialization waits on the synchronized
            initialized = true;
        }
    } catch (InterruptedException e) {
        throw new RuntimeException("Timeout while waiting for semaphore to initialize JenaSystem. Please call JenaSystem.init() before working with multiple threads.", e);
    } finally {
        initializationSemaphore.release();
        isInitializing.remove();
    }
}

@rvesse
Copy link
Member

rvesse commented Oct 23, 2024

@arne-bdt Throw up a PR for us to review then

@arne-bdt
Copy link
Contributor

@arne-bdt Throw up a PR for us to review then

Hmpf... I've started a new branch, but it's not working the way I saw it before... :-(
--> No solution in sight yet.

@afs
Copy link
Member

afs commented Oct 23, 2024

One thing to consider (as well) is to remove the hidden model in all vocabularies and change schemagen.
Replace m_model. calls with ResourceFactory. and maybe class initializer for JenaSystem.init.

A collection of constants don't need to be created off of a model. Even if want keep the hidden model it can be a minimal one, not ModelFactory.createDefaultModel.

This shouldn't require external vocabularies to make the same change.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 23, 2024
- printing an error to System.err, with a detailed message
  -> unable to fix the deadlock itself, that is all I came up with
- added integration tests for JenaSystem.init(), also for the parallel case.
arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 23, 2024
- printing an error to System.err, with a detailed message
  -> unable to fix the deadlock itself, that is all I came up with
- added integration tests for JenaSystem.init(), also for the parallel case.
@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 23, 2024

One thing to consider (as well) is to remove the hidden model in all vocabularies and change schemagen. Replace m_model. calls with ResourceFactory. and maybe class initializer for JenaSystem.init.

A collection of constants don't need to be created off of a model. Even if want keep the hidden model it can be a minimal one, not ModelFactory.createDefaultModel.

This shouldn't require external vocabularies to make the same change.

It is not only the ModelFactory. This is the complete list of classes, where static { JenaSystem.init(); } must be removed, to make the code sample work:

  • org.apache.jena.graph.NodeFactory
  • org.apache.jena.rdf.model.ModelFactory
  • org.apache.jena.rdf.model.impl.ModelCom
  • org.apache.jena.shared.PrefixMapping.Factory

@afs
Copy link
Member

afs commented Oct 23, 2024

This circular dependency between JenaSystem initialization and the ModelFactory is causing the issue.

yes - specifically, it is the class initialization lock being held by ModelFactory and any locking in JenaSystem.

With two threads, the deadlock is:

Thread1 is the thread calling ModelFactory.createDefaultModel
Thread2 is the thread calling JenaSystem.init directly.

Thread1: lock order is take the ModelFactory class init lock, then take the JenaSystem sync lock.
Thread2: lock order is take the JenaSystem sync then ModelFactory class init lock.

Prior to the 8654c07 change, thread2 would not take the ModelFactory class init lock inside synchonized but continued executing on partial initialization which works mostly .. except when it doesn't and it causes an NPE #2675.

It's a problem when using Jena as a library because of the multiple entry points doing class initlaization.

FWIW JenaSystem.init came in part because of class initialization circularity seeing infrequent uninitialized data even on one thread due to different class initialization orders.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 24, 2024
- Added a fail-fast mechanism: now throwing an `ExceptionInInitializerError` when a thread attempts to enter `JenaSystem.init()` while another thread is in the initialization process.
- Refactored `JenaSystem.init()` using the "Initialization-on-demand holder idiom" for simpler and safer initialization.
- Implemented integration tests for `JenaSystem.init()`, including parallel execution cases to ensure thread safety.
@arne-bdt
Copy link
Contributor

Due to the complexity of the initialization process and the presence of circular dependencies, resolving the deadlock issue is not straightforward. If we can't clean up the initialization process and eliminate these circular dependencies, we're essentially left with two problematic variants:

  1. NullPointerException (NPE) (prior to the 8654c07 change): If the user forgets to call JenaSystem.init() from a single-threaded context before using the library in a multithreaded environment, the application may encounter a NullPointerException. This can occur because certain components of the library remain uninitialized, leading to unexpected null references when accessed by multiple threads.

  2. Deadlock (since Jena 5.2.0): Alternatively, if the library is used in a multithreaded context without proper initialization, a deadlock may occur due to threads waiting on class initialization locks and synchronized methods. This is particularly difficult to diagnose because the application may simply hang without any clear indication of the cause.

For the Deadlock variant, I implemented a fail-fast approach that throws an informative exception. (See update in PR #2792.)

@afs
Copy link
Member

afs commented Oct 24, 2024

I think the solution is to go back to #2675 and look to fix the class-initializer problem. That probably needs investigating anyway.

Then there is no occurence of possible alternative orders of taking locks which is the deadlock.

The #2675 / 8654c07 is a non-concurrency ordering problem in a single thread - i.e. a cyclic class initialization.

Requiring usage to only have single threaded initialization could be quite a problem in environments that have multi-threading and then use Jena as a library. It is not always so easy to have application code in the single-threaded startup phase.

@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 24, 2024

I think requiring single threaded initialization will break existing usage. Until we agree that it is the only option, I don't think we should go that way.

Basically every usage, that would trigger an exeption with PR #2792, will end up in a deadlock with Jena 5.2.

What do we do with Jena 5.2 then? Do we withdraw the publication or do we issue a clear warning?

@afs
Copy link
Member

afs commented Oct 24, 2024

Yes, but the usage is unusual (scalatest).

We have workaround for users that need 5.2 and can put code in the single threaded stage and it is something that's been encouraged for quite awhile.

If fixing the single-threaded #2675 another way can be done, we do not need to have users' change.

If, in investigating #2675, we discover it can not be done, then we may have to require user change but that's a big step.

The practical difficulty is that any change is one we can't test completely because there is too large a variety of uses. (And also that people don't test pre-releases.)

What do we do with Jena 5.2 then? Do we withdraw the publication or do we issue a clear warning?

ASF, and open source generally, doesn't remove releases. We can put out a new version.

I am sorry that PR 8654c07 broke some usages.

@afs
Copy link
Member

afs commented Oct 24, 2024

Tentatively, GH-2675 seems to be fixed by just adding the RDFLinkHTTP without the JenaSystem.init change moving initlialized = true;

That is not full confirmation of a fix yet but it does change the behaviour of the thread undertaking the initialization because it no longer calls makes a nested call to singleton.initialize().

Addendum: that leaves the possibility of a thread seeing "initialized = true" and skipping the synchronized block, proceeding without initialization having completed.

None of the reports have been about this situation and it may be that it is hidden (initialized = true not having been written back even though volatile until the exit from synchronized which is a memory happens-before point) or impossible because of class initialization caused by singleton.initialize() will hold the class initialization lock. It does suggest a robust solution with 2 volatiles (or AtomicBoolean for better control) "before" and "after" the initialization work step.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 24, 2024
- reinstated early return from JenaSystem.init() from before apache@8654c07
- simplified init using "initialization-on-demand holder idiom"
- added integration tests for JenaSystem.init(), also for the parallel case.
@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 24, 2024

... that leaves the possibility of a thread seeing "initialized = true" and skipping the synchronized block, proceeding without initialization having completed.

I thought that was the important part, that needed fixing by moving the initialized = true.

--> So I updated my branch by moving the initialized = true back to its old position and removed the exception:
main...arne-bdt:jena:GH-2787-Deadlock_in_Jena_System_init
The test code does not cause deadlocks any more.

None of the reports have been about this situation...

Sounds great. I also tried to reproduce such a situation with the code that reproduces the deadlock. I could not convince a static field from the vocabulary to return null .

Unfortunately I have no idea on how to test if issue #2675 still works, the PR #2677 does not seem to contain a new dedicated test.

@afs
Copy link
Member

afs commented Oct 24, 2024

It is difficult to test for because Jena is likely initlialized and any test will pass for that reason -- it isn't a test of class initialization order and we can not rely on the test.

If this is run from public static void main, and there is no JenaSystem.init call in RDFLinkHTTPBuilder, it should go wrong.

   public static void mainNPE2675() {
        System.out.println("GH-2675: RDFLinkHTTP NPE");
        try (RDFConnection conn = RDFConnectionFuseki.service("http://localhost:3030/ds").build()) {

        }
        System.out.println("DONE");
    }

@arne-bdt
Copy link
Contributor

It is difficult to test for because Jena is likely initlialized and any test will pass for that reason -- it isn't a test of class initialization order and we can not rely on the test.

If this is run from public static void main, and there is no JenaSystem.init call in RDFLinkHTTPBuilder, it should go wrong.

   public static void mainNPE2675() {
        System.out.println("GH-2675: RDFLinkHTTP NPE");
        try (RDFConnection conn = RDFConnectionFuseki.service("http://localhost:3030/ds").build()) {

        }
        System.out.println("DONE");
    }

Great: It runs perfectly. But when static { JenaSystem.init(); } is removed in RDFLinkHTTPBuilder, it fails with:
java.lang.ExceptionInInitializerError ... Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.jena.rdf.model.Resource.asNode()" because "org.apache.jena.vocabulary.RDF.Alt" is null

@arne-bdt
Copy link
Contributor

In my branch main...arne-bdt:jena:GH-2787-Deadlock_in_Jena_System_init, I observerd something strage:
I thought, I could make the solution more robust by using static AtomicBoolean initialized = new AtomicBoolean(false); instead of static boolean initialized = false;

But instead, this change reliably takes me straight back to the reproducible deadlock situation.

So I reverted that experment.

@afs
Copy link
Member

afs commented Oct 24, 2024

Is the LazyHolder any different to the 5.1.0 approach in SystemJena.init? A second thread coming is not waiting for initlialization to complete and can touch classes in a different order. (we don't have a reproducible example.)

@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 24, 2024

Is the LazyHolder approach any different from the 5.1.0 approach in SystemJena.init? A second thread coming in is not waiting for initialization to complete and may access classes in a different order. (We don't have a reproducible example.)

The LazyHolder replaces only the synchronized(JenaSystem.class) mechanism. Unlike the synchronized block, the JVM ensures that the LazyHolder is only accessed once, so there’s no need for double-checking with if (initialized) return;. This makes it both efficient and more robust.

When running org.apache.jena.sys.TestJenaSystem#testFirstInitParallel directly from the IDE (without prior initialization of JenaSystem), it’s quite straightforward to observe the exact behavior, and the test consistently produces reliable results. So far, I haven’t encountered any non-reproducible behavior (e.g., due to timing issues).
Which kind of reproducible example are you missing?

However, I have not yet figured out how to integrate this test to run it with Maven in a separate JVM instance to ensure that the JenaSystem has not been initialised.

For more information on the pattern, you can refer to the documentation here and here.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 25, 2024
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
@arne-bdt
Copy link
Contributor

However, I have not yet figured out how to integrate this test to run it with Maven in a separate JVM instance to ensure that the JenaSystem has not been initialised.

Update: Using the JMH framework, it was easy to implement a test that always starts with a fresh JVM.

On that basis, I could run many iterations and stress it even more with threads.
--> The solution in main...arne-bdt:jena:GH-2787-Deadlock_in_Jena_System_init seems to be stable.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 25, 2024
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 25, 2024
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 25, 2024
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
@afs
Copy link
Member

afs commented Oct 27, 2024

The LazyHolder replaces only the synchronized(JenaSystem.class) mechanism. Unlike the synchronized block, the JVM ensures that the LazyHolder is only accessed once, so there’s no need for double-checking with if (initialized) return;. This makes it both efficient and more robust.

(Because this is critical in Jena, forgive me for wanting to get into the details.)

I don't quite understand "more robust" here. The second thread is allowed to continue once initialized is set true. LazyHolder is
(I don't have a solution for that and it may be a non-issue because the second thread bumps into an initializing class.)

Efficiency - in our case, there isn't much different because the volatile is still there.

@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 27, 2024

(Because this is critical in Jena, forgive me for wanting to get into the details.)

I don't quite understand "more robust" here. The second thread is allowed to continue once initialized is set true. LazyHolder is (I don't have a solution for that and it may be a non-issue because the second thread bumps into an initializing class.)

@afs This pattern is a "cleaner" approach to static initialization, as it delegates synchronization responsibility to the JVM. It also guarantees the code executes only once. To replicate what the JVM does for us, we currently use synchronized and a secondary check within the synchronized block for any thread that reaches that point and is waiting:

synchronized(JenaSystem.class) {
            if ( initialized )
                return ;
            ...
}

I understand that introducing this pattern may have caused some confusion, as it does not directly contribute to solving the problem. The actual fix was simply moving initialized = true; back to its original position.

Since I prefer to clean up code whenever I need to modify it, I introduced the pattern. We are early in the Jena 5.3 cycle, and we have code to reproduce the threading issues along with solid tests. I ran numerous test iterations that demonstrated stable behavior.
Given this, I valued the benefits of the initialization pattern over the potential risk of a different runtime behavior.

@arne-bdt
Copy link
Contributor

arne-bdt commented Oct 27, 2024

I have my experience with threading and I like challenges. So I jumped on this bug, which could also be relevant for some of my work projects. But now I am at a loss.

Restoring the early return from JenaSystem.init() as implemented prior to commit 8654c07 does not solve the deadlock bug.

It only reduces the time-window and thus the likelihood of an occurrence.
The developed tests clearly show that the problem persists.
So here we are again with two options:

  1. Resolve the circular depencencies. Unfortunately I could not wrap my head around that yet and would be delighted if someone with more knowledge about cross-dependencies and the history of static Jena initialisers would take a look at it.
  2. Force JenaSystem.init() to be called from a single-treaded context. (but that option has already been discarded as last resort)

@afs
Copy link
Member

afs commented Oct 28, 2024

Class initlializtion spec: https://docs.oracle.com/javase/specs/jls/se21/html/jls-12.html#jls-12.4.2

The LazyHolder has an advantage here because it is part of the overall class initialization process, the usual way applications using Jena as a library will cause JenaSystem.init() to be called.

@afs
Copy link
Member

afs commented Oct 29, 2024

Following @arne-bdt's proposal, PR #2807 puts in the lazy holder pattern and moves the setting initialization to the start of the process. It passes the deadlock test case at he top of this issue.

If we all agree, putting that in unlocks further testing of scalatest in case more issues turn up.

The PR is an improvement - testing (in the build and in usage) can follow.

@Ostrzyciel
Copy link
Contributor Author

@afs Thank you all for putting all the effort in solving this! If the code passes the deadlock case, then I'm pretty sure it's alright. I do not fully understand how it works, so I can't comment on the details of the solution.

arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 29, 2024
- applied asf solution for JenaSystem init
- changed JHM-Settings
 --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT)

Originally for apacheGH-2787: Deadlock in `JenaSystem.init()`
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 29, 2024
- applied asf solution for JenaSystem init
- changed JHM-Settings
 --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT)

Originally for apacheGH-2787: Deadlock in `JenaSystem.init()`
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
arne-bdt pushed a commit to arne-bdt/jena that referenced this issue Oct 29, 2024
- applied asf solution for JenaSystem init
- changed JHM-Settings
 --> .mode(Mode.SingleShotTime).verbosity(VerboseMode.SILENT)

Originally for apacheGH-2787: Deadlock in `JenaSystem.init()`
- Restored the early return from `JenaSystem.init()` as implemented prior to [commit 8654c07](apache@8654c07).
- Simplified initialization using the "Initialization-on-demand holder idiom" for better efficiency and thread safety.
- Added integration tests for `JenaSystem.init()`, including tests for parallel execution cases.
- Implemented additional integration tests that start with a fresh JVM for each run, using JMH as the test runner.
  - Added `org.openjdk.jmh` as a dependency in `jena-integration-tests`.
- Added a regression test for apache#2675 as org.apache.jena.sys.TestJenaSystemWithFreshJVM.initRDFConnectionFuseki
@afs
Copy link
Member

afs commented Nov 28, 2024

We seem to be done there.

#2809 may happen but the root cause of this issue is fixed.

Let's close, and re-open if there is new information.

@afs afs closed this as completed Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants