Skip to content

Conversation

hmottestad
Copy link
Contributor

@hmottestad hmottestad commented Sep 27, 2025

GitHub issue resolved: #5447

Briefly describe the changes proposed in this PR:


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@hmottestad
Copy link
Contributor Author

hmottestad commented Oct 1, 2025

Main branch benchmark results

Benchmark                                                     Mode  Cnt     Score    Error  Units
QueryBenchmark.complexQuery                                   avgt    5     6.814 ±  0.501  ms/op
QueryBenchmark.different_datasets_with_similar_distributions  avgt    5     4.011 ±  0.324  ms/op
QueryBenchmark.groupByQuery                                   avgt    5     1.402 ±  0.048  ms/op
QueryBenchmark.long_chain                                     avgt    5  1150.038 ± 74.033  ms/op
QueryBenchmark.lots_of_optional                               avgt    5   424.333 ± 30.847  ms/op
QueryBenchmark.minus                                          avgt    5   945.155 ± 24.901  ms/op
QueryBenchmark.multiple_sub_select                            avgt    5    97.322 ±  6.308  ms/op
QueryBenchmark.nested_optionals                               avgt    5   254.166 ± 10.428  ms/op
QueryBenchmark.optional_lhs_filter                            avgt    5    72.703 ±  8.745  ms/op
QueryBenchmark.optional_rhs_filter                            avgt    5   106.735 ±  5.033  ms/op
QueryBenchmark.ordered_union_limit                            avgt    5   271.306 ± 59.919  ms/op
QueryBenchmark.pathExpressionQuery1                           avgt    5    47.773 ±  2.007  ms/op
QueryBenchmark.pathExpressionQuery2                           avgt    5    11.707 ±  0.925  ms/op
QueryBenchmark.query_distinct_predicates                      avgt    5    62.905 ±  2.968  ms/op
QueryBenchmark.simple_filter_not                              avgt    5    11.180 ±  0.549  ms/op
QueryBenchmark.sub_select                                     avgt    5   117.819 ± 21.703  ms/op
QueryBenchmarkFoaf.groupByCount                               avgt    5  1269.171 ± 48.931  ms/op
QueryBenchmarkFoaf.groupByCountSorted                         avgt    5  1117.040 ± 87.518  ms/op
QueryBenchmarkFoaf.personsAndFriends                          avgt    5   454.327 ± 33.633  ms/op



Benchmark                           Mode  Cnt   Score   Error  Units
RecordIteratorBenchmark.iterateAll  avgt    5  29.401 ± 0.478  ms/op - 1 thread
RecordIteratorBenchmark.iterateAll  avgt    5  44.083 ± 1.716  ms/op - 8 threads


Benchmark                                                       Mode  Cnt      Score      Error  Units
TransactionsPerSecondBenchmark.largerTransaction               thrpt    5     76.259 ±   13.348  ops/s
TransactionsPerSecondBenchmark.largerTransactionLevelNone      thrpt    5     84.887 ±    7.593  ops/s
TransactionsPerSecondBenchmark.mediumTransactionsLevelNone     thrpt    5  24967.891 ± 6360.637  ops/s
TransactionsPerSecondBenchmark.transactions                    thrpt    5  37628.622 ± 6447.745  ops/s
TransactionsPerSecondBenchmark.transactionsLevelNone           thrpt    5  40614.791 ± 1983.493  ops/s
TransactionsPerSecondBenchmark.veryLargerTransactionLevelNone  thrpt    5      0.582 ±    0.044  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction100kx            thrpt    5      5.167 ±    0.291  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction100kxLevelNone   thrpt    5      6.779 ±    0.622  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10kx             thrpt    5     47.470 ±    5.743  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10kxLevelNone    thrpt    5     55.428 ±    5.997  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10x              thrpt    5  13022.788 ± 2882.596  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10xLevelNone     thrpt    5  14720.898 ± 3490.659  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction1x               thrpt    5  29974.640 ± 6896.057  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction1xLevelNone      thrpt    5  32384.436 ± 6190.305  ops/s

Current branch

Benchmark                                                     Mode  Cnt    Score    Error  Units
QueryBenchmark.complexQuery                                   avgt    5    3.880 ±  0.118  ms/op
QueryBenchmark.different_datasets_with_similar_distributions  avgt    5    2.221 ±  0.024  ms/op
QueryBenchmark.groupByQuery                                   avgt    5    0.932 ±  0.003  ms/op
QueryBenchmark.long_chain                                     avgt    5  679.258 ±  8.916  ms/op
QueryBenchmark.lots_of_optional                               avgt    5  224.328 ±  3.230  ms/op
QueryBenchmark.minus                                          avgt    5    9.276 ±  0.069  ms/op
QueryBenchmark.multiple_sub_select                            avgt    5   52.888 ±  1.017  ms/op
QueryBenchmark.nested_optionals                               avgt    5  155.438 ±  0.460  ms/op
QueryBenchmark.optional_lhs_filter                            avgt    5   36.274 ±  1.099  ms/op
QueryBenchmark.optional_rhs_filter                            avgt    5   53.820 ±  1.159  ms/op
QueryBenchmark.ordered_union_limit                            avgt    5   73.762 ±  1.176  ms/op
QueryBenchmark.pathExpressionQuery1                           avgt    5   20.691 ±  0.179  ms/op
QueryBenchmark.pathExpressionQuery2                           avgt    5    3.945 ±  0.033  ms/op
QueryBenchmark.query_distinct_predicates                      avgt    5   43.868 ±  0.447  ms/op
QueryBenchmark.simple_filter_not                              avgt    5    6.004 ±  0.031  ms/op
QueryBenchmark.sub_select                                     avgt    5   69.010 ±  0.984  ms/op
QueryBenchmarkFoaf.groupByCount                               avgt    5  746.923 ± 12.183  ms/op
QueryBenchmarkFoaf.groupByCountSorted                         avgt    5  648.512 ± 21.408  ms/op
QueryBenchmarkFoaf.personsAndFriends                          avgt    5  199.420 ±  7.442  ms/op



Benchmark                           Mode  Cnt   Score   Error  Units
RecordIteratorBenchmark.iterateAll  avgt    5  19.033 ± 0.352  ms/op - 1 thread
RecordIteratorBenchmark.iterateAll  avgt    5  20.829 ± 0.288  ms/op - 8 threads


Benchmark                                                       Mode  Cnt      Score       Error  Units
TransactionsPerSecondBenchmark.largerTransaction               thrpt    5     72.034 ±    16.136  ops/s
TransactionsPerSecondBenchmark.largerTransactionLevelNone      thrpt    5     93.334 ±     1.510  ops/s
TransactionsPerSecondBenchmark.mediumTransactionsLevelNone     thrpt    5  26266.973 ±  4663.253  ops/s
TransactionsPerSecondBenchmark.transactions                    thrpt    5  37958.103 ± 10533.605  ops/s
TransactionsPerSecondBenchmark.transactionsLevelNone           thrpt    5  37719.912 ±  3110.764  ops/s
TransactionsPerSecondBenchmark.veryLargerTransactionLevelNone  thrpt    5      0.609 ±     0.007  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction100kx            thrpt    5      5.541 ±     0.137  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction100kxLevelNone   thrpt    5      7.568 ±     0.272  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10kx             thrpt    5     51.184 ±     1.454  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10kxLevelNone    thrpt    5     59.807 ±     0.800  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10x              thrpt    5  15559.039 ±   385.342  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction10xLevelNone     thrpt    5  16004.717 ±  1121.272  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction1x               thrpt    5  32018.049 ±  1467.008  ops/s
TransactionsPerSecondBenchmarkFoaf.transaction1xLevelNone      thrpt    5  33619.495 ±  3284.726  ops/s

@hmottestad hmottestad changed the title Gh 5447 alternate GH-5447 LMDB Store query performance improvements Oct 2, 2025
@hmottestad hmottestad marked this pull request as ready for review October 2, 2025 09:07
@hmottestad hmottestad changed the base branch from main to develop October 2, 2025 09:08
}
}

private static final class ReadTxn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmottestad Could you explain the performance impact of this change? Is this something that can also by using the TxnManager class:
https://github.com/eclipse-rdf4j/rdf4j/blob/main/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TxnManager.java

Copy link
Contributor Author

@hmottestad hmottestad Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any LmdbValue that needs to be resolved would previously start a new read transaction, resolve the value, then abort the read transaction. This was really costly and really affected GROUP BY, DISTINCT, and ORDER BY, as well as when a query returns a lot of results to the user.

I played around with having some kind of long-running read transaction, but got a lot of issues with stale reads. But even if we can't do that, we can still utilize the reset/renew approach, which is a lot faster than starting a new transaction every time.

Since it's hard to clean up these read transactions, I found it best to use ThreadLocal combined with Java 9 Cleaner. I've also just now added some cleanup in the commit/rollback phase, since we are guaranteed that a thread that is in one of those two phases can't also be using the read transaction at the same time, and it also helps to proactively clean up the read transactions (abort them) just in case they are holding any locks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if we could unify the handling of transactions between ValueStore and TripleStore.

@hmottestad
Copy link
Contributor Author

Would be great if you have the time to review this branch @kenwenzel!

Copy link
Contributor

@kenwenzel kenwenzel Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmottestad Maybe you also like to take a look at this proposed change:
https://github.com/kenwenzel/rdf4j/blob/757cc5a2d2c75671814eb975ece9e1e00059f06f/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java

The diff is here:
https://github.com/eclipse-rdf4j/rdf4j/pull/5443/files#diff-ca47132d36c6a59185624371fa142a2cc669503bb421b8d4ec0cd39bb8d25e65

There are fewer changes in the file and for shorter varints the byte order of the buffer needs not to be changed becasue there is one method for big endian and one for little endian.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look and the ByteBuffer is able to use native memory operations with a specific endianness. For example:

    private ByteBuffer putInt(long a, int x) {
        try {
            int y = (x);
            SCOPED_MEMORY_ACCESS.putIntUnaligned(session(), null, a, y, bigEndian);
        } finally {
            Reference.reachabilityFence(this);
        }
        return this;
    }

So bit fiddling is slower than changing the endianness flag and then delegation to the correct CPU instruction. ARM at least has support for both big and little.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also hold for LWJGL byte buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@kenwenzel
Copy link
Contributor

Would be great if you have the time to review this branch @kenwenzel!

I don't have access to a computer the next days. But the changes look good and the performance gains speak for themselves. The only thing could be aligning the read txns between value store and triple store. Maybe we could unify the code somehow?
But feel free to merge now and I'll try to unify this later.

@hmottestad
Copy link
Contributor Author

I'll try to merge this very soon. I had to fix an issue with MINUS and EXISTS. And I want to rerun the benchmarks to double check it didn't make anything terribly slow.

@kenwenzel
Copy link
Contributor

kenwenzel commented Oct 3, 2025

@hmottestad If you have the time then maybe you could also check one of the write benchmarks.

@hmottestad hmottestad merged commit fbd79ad into develop Oct 3, 2025
11 checks passed
@hmottestad hmottestad deleted the GH-5447-alternate branch October 3, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of queries in the LMDB Store
2 participants