Skip to content

Conversation

kenwenzel
Copy link
Contributor

@kenwenzel kenwenzel commented Apr 17, 2024

Implement an extensible ID scheme for the LmdbStore that also allows to embed values into IDs.

GitHub issue resolved: #4950

Briefly describe the changes proposed in this PR:

Use a binary encoding for IDs that allows to represent the value type, if the value itself is embedded within the id and the embedded value itself.


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 474eadf to 6fd1d40 Compare April 18, 2024 06:11
@hmottestad
Copy link
Contributor

The failing integration tests: It's a test that is marked as disabled, but has somehow started to run again. Not sure how that happened.

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 6fd1d40 to a53aae5 Compare April 22, 2024 08:16
@kenwenzel
Copy link
Contributor Author

@hmottestad It would be good to have this in 5.0.0 as it is a breaking change but I am not sure if I can finish the whole logic.

@hmottestad
Copy link
Contributor

Sorry. I've already published the M3 milestone build and we now have a feature freeze until the final release. The LMDB Store is experimental, so you should be able to make breaking changes for 5.1.0. Might be better too, since then users can upgrade to 5.0.0 and get all the fixes and enhancements without any major breaking changes.

@hmottestad
Copy link
Contributor

PS: The develop branch is now 5.1.0 so you can merge into the develop branch without affecting the 5.0.0 release.

@nguyenm100
Copy link
Contributor

rdf 1.2 is largely done and jena has an implementation. apache/jena#2805

might be a good idea to brush the dust off of this pr. how much is left to do @kenwenzel ?

@kenwenzel
Copy link
Contributor Author

@nguyenm100 Currently, only the ID scheme in the value store is implemented.
What is missing:

  • embedding of (numeric) values into IDs
  • handling of triple terms
  • indexes for triple terms as part of the triple store

@hmottestad Do we already have an issue for RDF 1.2 and/or can we just adapt the current RDF-star support?

@nguyenm100
Copy link
Contributor

Hey @kenwenzel is it possible to just do (1) embedding of values into IDs and merge this pr and then (2) and (3) with rdf 1.2 implementation? or is there some dependency from 1.2 for this to be usable?

@kenwenzel
Copy link
Contributor Author

@nguyenm100 Yes, that's perfectly possible.

@nguyenm100
Copy link
Contributor

Thanks @kenwenzel are you able to finish (1) and merge this pr? If so, and at that point does it make sense to pull the experimental designation from lmdbstore? (or does that force a major version upgrade?)

@kenwenzel
Copy link
Contributor Author

@nguyenm100 (1) is a bit of work - I think at least 2 or 3 days, but there is no direct dependency to RDF 1.2.

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from a53aae5 to e615879 Compare June 4, 2025 08:18
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 5 times, most recently from 156de3d to 61a53a4 Compare June 17, 2025 13:47
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 61a53a4 to 37cb363 Compare June 19, 2025 06:55
Implement an extensible ID scheme for the LmdbStore that
also allows to embed values into IDs.
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 2 times, most recently from 8f8f017 to 2fdd6c1 Compare June 20, 2025 13:03
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 3 times, most recently from e489df7 to a00aed8 Compare June 23, 2025 07:50
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 3 times, most recently from 46b8fa7 to 3ec78e6 Compare June 23, 2025 13:00
@kenwenzel
Copy link
Contributor Author

kenwenzel commented Jun 24, 2025

@odysa @nguyenm100 @hmottestad Do we need to make this backwards compatible with the current ID scheme or do you think a breaking change is OK?

@nguyenm100
Copy link
Contributor

Hi Ken, what's the loe for backward compat? we have one client keeping a rolling weekly snapshot that I would need to chat with.. alternatively, we can introduce it an lmdb2 sail (is that what tdb did?)

@kenwenzel
Copy link
Contributor Author

@nguyenm100 It should be possible by extracting some constants and considering them in ValueStore and TripleStore. It might have a small performance impact.

@nguyenm100
Copy link
Contributor

is the perf penalty imposed if it's interacting with lmdb v2 or only present if dealing with lmdb v1?

@kenwenzel
Copy link
Contributor Author

is the perf penalty imposed if it's interacting with lmdb v2 or only present if dealing with lmdb v1?

Probably both as we have to check which version is currently used for each operation with the value store. Currently, all type markers are static integer constants but that needs to be changed to support both versions in parallel.

@kenwenzel kenwenzel marked this pull request as ready for review June 24, 2025 14:26
@nguyenm100
Copy link
Contributor

Hey Ken, all good on our side. changes don't need to be backward compat for us. tx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants