-
Notifications
You must be signed in to change notification settings - Fork 175
GH-5447 LMDB Store query performance improvements #5456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Main branch benchmark results
Current branch
|
b8c5069
to
be13d4b
Compare
be13d4b
to
63ff805
Compare
} | ||
} | ||
|
||
private static final class ReadTxn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmottestad Could you explain the performance impact of this change? Is this something that can also by using the TxnManager class:
https://github.com/eclipse-rdf4j/rdf4j/blob/main/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TxnManager.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any LmdbValue that needs to be resolved would previously start a new read transaction, resolve the value, then abort the read transaction. This was really costly and really affected GROUP BY, DISTINCT, and ORDER BY, as well as when a query returns a lot of results to the user.
I played around with having some kind of long-running read transaction, but got a lot of issues with stale reads. But even if we can't do that, we can still utilize the reset/renew approach, which is a lot faster than starting a new transaction every time.
Since it's hard to clean up these read transactions, I found it best to use ThreadLocal combined with Java 9 Cleaner. I've also just now added some cleanup in the commit/rollback phase, since we are guaranteed that a thread that is in one of those two phases can't also be using the read transaction at the same time, and it also helps to proactively clean up the read transactions (abort them) just in case they are holding any locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool if we could unify the handling of transactions between ValueStore and TripleStore.
Would be great if you have the time to review this branch @kenwenzel! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmottestad Maybe you also like to take a look at this proposed change:
https://github.com/kenwenzel/rdf4j/blob/757cc5a2d2c75671814eb975ece9e1e00059f06f/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java
The diff is here:
https://github.com/eclipse-rdf4j/rdf4j/pull/5443/files#diff-ca47132d36c6a59185624371fa142a2cc669503bb421b8d4ec0cd39bb8d25e65
There are fewer changes in the file and for shorter varints the byte order of the buffer needs not to be changed becasue there is one method for big endian and one for little endian.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look and the ByteBuffer is able to use native memory operations with a specific endianness. For example:
private ByteBuffer putInt(long a, int x) {
try {
int y = (x);
SCOPED_MEMORY_ACCESS.putIntUnaligned(session(), null, a, y, bigEndian);
} finally {
Reference.reachabilityFence(this);
}
return this;
}
So bit fiddling is slower than changing the endianness flag and then delegation to the correct CPU instruction. ARM at least has support for both big and little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also hold for LWJGL byte buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
9f740b7
to
b1ca2e9
Compare
I don't have access to a computer the next days. But the changes look good and the performance gains speak for themselves. The only thing could be aligning the read txns between value store and triple store. Maybe we could unify the code somehow? |
48a44b4
to
f35ebeb
Compare
f35ebeb
to
8a3a131
Compare
I'll try to merge this very soon. I had to fix an issue with MINUS and EXISTS. And I want to rerun the benchmarks to double check it didn't make anything terribly slow. |
@hmottestad If you have the time then maybe you could also check one of the write benchmarks. |
GitHub issue resolved: #5447
Briefly describe the changes proposed in this PR:
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)