You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be very useful to have in BQL a FILTER keyword that could allow us to filter out part of the results of a query in a level closer to the storage (closer to the driver), improving performance.
Some of the functionalities that this FILTER keyword could accept:
Filtering only the triples with immutable predicates for our query result.
The syntax for that could be something like:
FILTER isImmutable(?p)
Coming inside or after the WHERE clause to specify that the predicate bound to ?p in our query should be immutable.
Another function isTemporal could work similarly.
This could come as a solution for what was asked in the Issue #115.
Filtering only the triples with the latest time anchor for our query result.
It is pretty common to be interested only in the latest triple of a time series. Instead of getting all the triples, sorting them by the time anchor in decreasing order and limiting the result to 1, as one may be doing nowadays, we could just use FILTER to do that in a much less expensive way using a syntax like:
FILTER latest(?p)
This is a pretty common use case already highlighted by the Issues #86 and #85.
It also opens the possibility for supporting the opposite: filtering only the earliest time anchor, as illustrated below.
FILTER earliest(?p)
One could also decide for a syntax that uses directly the time binding for filtering, like in:
FILTER latest(?date)
Allow using regular expressions for matching.
We could use regex for filtering too. The syntax could be something like:
FILTER match(?obj, "ab+"^^type:text)
Which also resonates with what was asked by the Issue #122.
Filtering to satisfy comparisons (evaluated as boolean conditions).
For example:
FILTER greaterThan(?obj, "37"^^type:int64)
With the functions lowerThan and equal it should be analogous.
Other ideas for filtering functions could be the likes of:
FILTER isToday(?date)
That would compare a binding with a value extracted during runtime (the current day in this example).
These above are just some examples. The FILTER keyword could open space for a number of other functionalities in the future, as we discover new ones that could be handy and implement them as functions for filtering (just like the functions isImmutable and latest above).
The idea is for the FILTER functions all have a signature like below:
FILTER myFunction(?binding, <value>)
With the <value> argument above being optional (depending on the function it is not necessary, isImmutable does not require it for example).
This way, when adding a new function no new changes will be necessary inside the parser or inside lookupOptions (that communicates with the driver, defined in storage.go). All the FILTER functions should be mapped to three variables there: operation, field (for the binding or its position in the clause) and value.
N.B.: Note how this FILTER keyword differ from the HAVING: the FILTER would work closer to the storage/driver level to improve query performance while filtering the results, while the HAVING would work focusing on aggregated data in a higher level farther from the driver (as when using functionalities such as sum and count to write your HAVING conditions).
The text was updated successfully, but these errors were encountered:
It would be very useful to have in BQL a
FILTER
keyword that could allow us to filter out part of the results of a query in a level closer to the storage (closer to the driver), improving performance.Some of the functionalities that this
FILTER
keyword could accept:The syntax for that could be something like:
Coming inside or after the
WHERE
clause to specify that the predicate bound to?p
in our query should be immutable.Another function
isTemporal
could work similarly.This could come as a solution for what was asked in the Issue #115.
It is pretty common to be interested only in the latest triple of a time series. Instead of getting all the triples, sorting them by the time anchor in decreasing order and limiting the result to 1, as one may be doing nowadays, we could just use
FILTER
to do that in a much less expensive way using a syntax like:This is a pretty common use case already highlighted by the Issues #86 and #85.
It also opens the possibility for supporting the opposite: filtering only the earliest time anchor, as illustrated below.
One could also decide for a syntax that uses directly the time binding for filtering, like in:
We could use regex for filtering too. The syntax could be something like:
Which also resonates with what was asked by the Issue #122.
For example:
With the functions
lowerThan
andequal
it should be analogous.For example, one could write something like:
To get in the query result only the latest element of a time series while also restricting the time interval to be before a given date.
Another approach for this would be building a function like:
Other ideas for filtering functions could be the likes of:
That would compare a binding with a value extracted during runtime (the current day in this example).
These above are just some examples. The
FILTER
keyword could open space for a number of other functionalities in the future, as we discover new ones that could be handy and implement them as functions for filtering (just like the functionsisImmutable
andlatest
above).The idea is for the
FILTER
functions all have a signature like below:With the
<value>
argument above being optional (depending on the function it is not necessary,isImmutable
does not require it for example).This way, when adding a new function no new changes will be necessary inside the parser or inside
lookupOptions
(that communicates with the driver, defined instorage.go
). All theFILTER
functions should be mapped to three variables there:operation
,field
(for the binding or its position in the clause) andvalue
.For other general ideas, one could get inspiration from the SPARQL's
FILTER
keyword.N.B.: Note how this
FILTER
keyword differ from theHAVING
: theFILTER
would work closer to the storage/driver level to improve query performance while filtering the results, while theHAVING
would work focusing on aggregated data in a higher level farther from the driver (as when using functionalities such assum
andcount
to write yourHAVING
conditions).The text was updated successfully, but these errors were encountered: