An approach to 1-dim arrays using id-only properties #1389

matko · 2022-08-22T11:18:42Z

matko
Aug 22, 2022
Maintainer

In a discussion between @GavinMendelGleason and myself, the following idea arose.

Our current arrays work by allocating 'array cells', nodes of type sys:Array which carry an index (or several, in case of multidimensional arrays) and a value. This level of indirection allows us to retain a small number of properties, something that is advantageous in querying. It also allows for sparse arrays quite naturally. However, it also incurs the cost of that level of indirection in querying, as well as a storage cost in the reserving of an extra node. Especially for very large arrays of very small objects, having to allocate a node string for each element in that array can get very costly.

Another possible approach, specifically for single-dimensional arrays, is to define a specific range in the property id space which is just for array indexing. Such a shared id space could be done much like we're now already working with a shared node-value id space. For each layer, we'd have to store how many index property id's are to be reserved. The property id range for the next layer then starts at the sum of the cardinality of the property dict plus this reserved id range (plus one).

For example, suppose that in a base layer, our largest array contains 10 elements. In addition, that base layer uses 25 normal properties. In that case, property id 1-25 would refer to those normal properties. 26-35 would refer to indexes 0-9.
If in a child layer, we use 5 additional properties, and we make an array of 15 elements. Property ids 36-40 would then point to the 5 additional properties. 41-45 would refer to indexes 10-14.

Array indexes do not need to be indexed. For the purpose of the wavelet tree generation, they simply should not exist. So in the above example, we'd generate a wavelet tree for 30 elements, not 45. When looking up a triple through the wavelet tree we'd need to transform its property id into what its id would be without those indexes. For example, property 39 would become 29.

If we ever need to dump to rdf, we can auto-translate these ids to the rdf scheme rdf:_<index>, which seems to be the standard here. On triple insert, we can parse properties of this format to figure out if we need to reserve more index property ids.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TerminusDB

An approach to 1-dim arrays using id-only properties #1389

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

TerminusDB

An approach to 1-dim arrays using id-only properties #1389

matko Aug 22, 2022 Maintainer

Replies: 0 comments

matko
Aug 22, 2022
Maintainer