You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a discussion between @GavinMendelGleason and myself, the following idea arose.
Our current arrays work by allocating 'array cells', nodes of type sys:Array which carry an index (or several, in case of multidimensional arrays) and a value. This level of indirection allows us to retain a small number of properties, something that is advantageous in querying. It also allows for sparse arrays quite naturally. However, it also incurs the cost of that level of indirection in querying, as well as a storage cost in the reserving of an extra node. Especially for very large arrays of very small objects, having to allocate a node string for each element in that array can get very costly.
Another possible approach, specifically for single-dimensional arrays, is to define a specific range in the property id space which is just for array indexing. Such a shared id space could be done much like we're now already working with a shared node-value id space. For each layer, we'd have to store how many index property id's are to be reserved. The property id range for the next layer then starts at the sum of the cardinality of the property dict plus this reserved id range (plus one).
For example, suppose that in a base layer, our largest array contains 10 elements. In addition, that base layer uses 25 normal properties. In that case, property id 1-25 would refer to those normal properties. 26-35 would refer to indexes 0-9.
If in a child layer, we use 5 additional properties, and we make an array of 15 elements. Property ids 36-40 would then point to the 5 additional properties. 41-45 would refer to indexes 10-14.
Array indexes do not need to be indexed. For the purpose of the wavelet tree generation, they simply should not exist. So in the above example, we'd generate a wavelet tree for 30 elements, not 45. When looking up a triple through the wavelet tree we'd need to transform its property id into what its id would be without those indexes. For example, property 39 would become 29.
If we ever need to dump to rdf, we can auto-translate these ids to the rdf scheme rdf:_<index>, which seems to be the standard here. On triple insert, we can parse properties of this format to figure out if we need to reserve more index property ids.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In a discussion between @GavinMendelGleason and myself, the following idea arose.
Our current arrays work by allocating 'array cells', nodes of type
sys:Array
which carry an index (or several, in case of multidimensional arrays) and a value. This level of indirection allows us to retain a small number of properties, something that is advantageous in querying. It also allows for sparse arrays quite naturally. However, it also incurs the cost of that level of indirection in querying, as well as a storage cost in the reserving of an extra node. Especially for very large arrays of very small objects, having to allocate a node string for each element in that array can get very costly.Another possible approach, specifically for single-dimensional arrays, is to define a specific range in the property id space which is just for array indexing. Such a shared id space could be done much like we're now already working with a shared node-value id space. For each layer, we'd have to store how many index property id's are to be reserved. The property id range for the next layer then starts at the sum of the cardinality of the property dict plus this reserved id range (plus one).
For example, suppose that in a base layer, our largest array contains 10 elements. In addition, that base layer uses 25 normal properties. In that case, property id 1-25 would refer to those normal properties. 26-35 would refer to indexes 0-9.
If in a child layer, we use 5 additional properties, and we make an array of 15 elements. Property ids 36-40 would then point to the 5 additional properties. 41-45 would refer to indexes 10-14.
Array indexes do not need to be indexed. For the purpose of the wavelet tree generation, they simply should not exist. So in the above example, we'd generate a wavelet tree for 30 elements, not 45. When looking up a triple through the wavelet tree we'd need to transform its property id into what its id would be without those indexes. For example, property 39 would become 29.
If we ever need to dump to rdf, we can auto-translate these ids to the rdf scheme
rdf:_<index>
, which seems to be the standard here. On triple insert, we can parse properties of this format to figure out if we need to reserve more index property ids.Beta Was this translation helpful? Give feedback.
All reactions