Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FILTER keyword for latest anchor queries #146

Closed
wants to merge 34 commits into from

Conversation

rogerlucena
Copy link
Contributor

@rogerlucena rogerlucena commented Sep 21, 2020

This PR comes as a part of #129, setting up a FILTER keyword in BadWolf. The first FILTER function chosen to be implemented is the latest one, solving what was requested by #86.

It would be very useful to have in BQL a FILTER keyword that could allow us to filter out part of the results of a query in a level closer to the storage (closer to the driver), improving performance. This is exactly what this PR comes to introduce, with the full implementation of how this keyword could work on the driver/storage side as well (exemplified with the changes added to the volatile open-source driver in memory.go below).

Then, now the user can specify, inside of WHERE, which bindings they want to apply a FILTER to, proceeding with a more fine-grained lookup on storage, avoiding unnecessary retrieval of data and optimizing query performance.

To illustrate, queries such as the one below are now possible:

SELECT ?s, ?p, ?o
FROM ?test
WHERE {
    ?s ?p ?o .
    FILTER latest(?p)
};

That would return all the temporal triples of the ?test graph that have the latest timestamp of the time series they are part of (a recorrent use case in BadWolf), skipping immutable triples found along the way. This FILTER function also works for objects ?o in the case of reification/blank nodes: in this case the returned triples would be the ones on which the object is necessarily a temporal predicate with latest timestamp among the predicates with that same predicate ID (in the "object" position of the clause), analogously to what happened with ?p.

This FILTER for latest anchor above also works for alias bindings obtained with the AS keyword for predicates and objects too.

At the moment only one FILTER is supported for each graph clause inside of WHERE, and only one FILTER is supported for each given binding as well.

Regarding their position inside WHERE, FILTER clauses must come after all the graph pattern clauses, just by the end of the WHERE (closer to its closing bracket). Regarding trailing dots, a FILTER clause is understood just like any other graph clause inside of WHERE: the trailing dot is mandatory at the end of each clause (FILTER or graph ones indistinguishably), with the exception of the last one (for which the dot is optional).

In the future, to add a new FILTER function, the steps to follow are:

  1. Add a new enum item in the list of supported filter Operations in filter.go;
  2. Add a new entry in the SupportedOperations map in filter.go to map the lowercase string of the FILTER function being added to its correspondent filter.Operation element;
  3. Update the String method of Operation in filter.go;
  4. Add a new switch case inside compatibleBindingsInClauseForFilterOperation in planner.go to specify for which fields and bindings of a clause the newly added filter.Operation can be applied to;
  5. Implement the appropriate behavior on the driver side.

@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch 2 times, most recently from d09f413 to 2206197 Compare September 21, 2020 22:40
bql/planner/planner_test.go Show resolved Hide resolved
bql/semantic/hooks.go Outdated Show resolved Hide resolved
Comment on lines 344 to 360
defer func() {
lo.FilterOptions = (*storage.FilteringOptions)(nil)
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cleaning lo.FilterOptions, right?
could you add a comment?
the comment could state what is being done and why it is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just added the comments, but this is only being necessary here because we are not deprecating the bool LatestAnchor from lookupOptions yet.

This way, just to keep making the tests related to this bool passing in memory_test.go (only place where it is used), I decided the best approach for now was to keep support for it in the volatile OSS driver, but using a FILTER latest(?p) behind the drapes as the behavior is the same.

About the cleaning of lo.FilterOptions, it should not be necessary as I already do this cleaning on the planner level, with resetFilterOptions there. But since for LatestAnchor I am artificially creating a new lo.FilterOptions in the driver level, it is safer to also guarantee that it will be cleaned to nil in the driver level too, in the case someone think about adding a test in memory_test.go that concatenates a LatestAnchor with a normal call to the driver or a FILTER for something else there (without calling the planner function resetFilterOptions then). This does not happen at the moment (so if I do not add this defer there no tests would break) and I doubt someone will do it in the future, but I thought it was safer to add it.

Given that, it brings to light the discussion on when do we plan to deprecate this LatestAnchor bool from lookupOptions. Can we do this in a follow up PR or should we wait a bit more? This could allow us to simplify multiple parts of the code in memory.go, deleting all the boilerplate there related to LatestAnchor, and also delete a number of tests in memory_test.go that at the moment are just retesting FILTER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have talked with Thiago about and it seems that it will be necessary some refactoring in the internal side to deprecate this LatestAnchor bool from lookupOptions. Then, for now, I will be keeping the support for it in the OSS side as well.

storage/memory/memory.go Outdated Show resolved Hide resolved
storage/memory/memory.go Show resolved Hide resolved
bql/semantic/semantic_test.go Outdated Show resolved Hide resolved
bql/semantic/semantic_test.go Outdated Show resolved Hide resolved
bql/semantic/convert.go Outdated Show resolved Hide resolved
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch 3 times, most recently from 699b5f3 to 374eac0 Compare September 24, 2020 22:55
@rogerlucena rogerlucena marked this pull request as ready for review September 24, 2020 22:57
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch 2 times, most recently from c169600 to 30a5072 Compare September 25, 2020 21:43
@rogerlucena rogerlucena requested a review from rbkloss September 28, 2020 17:10
bql/planner/planner_test.go Show resolved Hide resolved
bql/planner/planner_test.go Show resolved Hide resolved
bql/lexer/lexer.go Show resolved Hide resolved
bql/semantic/semantic.go Outdated Show resolved Hide resolved
bql/semantic/semantic.go Show resolved Hide resolved
storage/memory/memory.go Outdated Show resolved Hide resolved
bql/semantic/semantic_test.go Show resolved Hide resolved
bql/semantic/semantic_test.go Show resolved Hide resolved
…ter test FILTER)

Two new triples were added so we could have, for both predicate and object positions, examples on which two triples would have the same latest anchor (expecting 2 rows for "FILTER latest" in this case, in the place of only 1 as it was in the usual case tested so far).
For that, the triples added were the one with predicate ""bought"@[2016-04-01T00:00:00-08:00]" and the other with object ""turned"@[2016-04-01T00:00:00-08:00]".

The third triple was added so that both "/u<peter>" and "/u<paul>" would have in common two temporal predicates - this way we can test if only one "FILTER latest(?p)" is working as expected for multiple graph clauses inside of WHERE (if they share that same binding "?p").
For this, the triple added was the one with predicate ""bought"@[2016-01-01T00:00:00-08:00]".
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch from c447e3e to d6698a8 Compare October 2, 2020 01:08
bql/grammar/grammar.go Outdated Show resolved Hide resolved
bql/grammar/grammar_test.go Show resolved Hide resolved
bql/planner/filter/filter.go Show resolved Hide resolved
bql/planner/filter/filter.go Outdated Show resolved Hide resolved
bql/semantic/hooks.go Outdated Show resolved Hide resolved
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch from d6698a8 to 3102dcb Compare October 2, 2020 19:44
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch from 3102dcb to e241717 Compare October 5, 2020 15:32
…ake FILTER for latest return multiple triples if they share the same predicate and same latest anchor
…n be applied to, improving error checking, and also move verification for supported filter functions to the planner level
…dition of multiple filter functions in the future)
@rogerlucena rogerlucena force-pushed the filter-latest-anchor branch from e241717 to 3c5c9ff Compare October 5, 2020 18:06
@rogerlucena
Copy link
Contributor Author

This PR was broken into two smaller PRs: #149 (grammar/lexer/hooks) and #150 (planner/memory).

@rogerlucena rogerlucena closed this Oct 5, 2020
@rogerlucena rogerlucena deleted the filter-latest-anchor branch October 6, 2020 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants