Allow specifying only last temporal triple on queries #86

xllora · 2018-12-06T16:58:58Z

There is a common use case were temporal triples are use to maintain time series. However, common queries just require to get the last triple in the time series, not all the anchored versions. The goal of this issue is to track the modifications required to do so.

xllora · 2018-12-06T16:59:52Z

The initial work only focus on the modifications required on the storage layer to make it possible.

This change starts the work on issue #86. The initial work focus on the modifications required on the storage layer by the introduction of a new flag `LatestAnchor` to `storage.LookupOptions`. This will signal the drivers to only use the latest incarnation of a temporal triple. This change also contains the modifications to the volatile in memory default driver to implement the added semantics by `LatestAnchor`.

Address typo introduced during the addition of `LatestAnchor` definition for issue #86.

tucnak · 2019-12-26T11:15:04Z

Hello! Is there a timeline on any of these issues? BQL looks great, and kind of natural if you are coming from some SQL/SPARQL place, but the killer feature that is anchoring seems off-the-mile just enough to throw the whole thing off.

One case which I would love to make for something like BQL would be to track user actions, so easy analytics over of the whole user base experience becomes possible, but then again you would expect more flexibility in how requests are made. For certain queries you would need to specify a timeframe, and maybe some limits (sampling?) and ideally you would want to group different anchor points by their predicate in the output, so it's then easier to deal with it in the application.

I believe I could be missing something, but from what I've read, seen in the code, and tried online—it doesn't seem possible to be doing any of this at all. Unfortunately, it's not clear how to do this in cayley either. If only the long-awaited crossover episode did happen with at least some of this functionality in place, it would instantly make efficient temporal modeling over arbitrary backends not only possible, but quite easy actually. (postgres!! I believe it's quite good at nosql as well as hstore?)

P.S. It's a great pity grammatically that BQL doesn't support a trailing dot . for queries with multiple clauses.

xllora · 2020-01-08T16:09:21Z

Hi @tucnak! Thanks for sharing your thoughts.

The changes to get latest are in place for temporal predicates. The part that we have not decide is how to expose it in to BQL. The naive way would go with something along the lines of

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[LATEST]  as ?p ?o
}

This would allow to introduce other semantics (e.g. FIRST) or primitives that require extra info like your point on sampling (e.g. SAMPLE(10)). These would require also some costraining on what you can pass on time bounded queries.

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[,]  AS ?p ?o
}

Would be equivalent to

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[FIRST,LATEST]  AS ?p ?o
}

This could also allow some advance, more obscure querying like

SELECT ?p1, ?o1, ?p2, ?o2
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[SAMPLE(10)]  AS ?p1 AT ?t  ?o1 .
    /some_type<some_id> "bar"@[?t]  AS ?p2 AT ?t  ?o1 
}

Which would allow to properly retrieve related entries to the returned samples.

This requires a bit of a change and we want to collect enough use cases before we make a proposal. Do you have any example that you could share? This would help us iron out the native proposal sketched above.

P.S. The trailing dot .. Mmh. https://www.w3.org/TR/rdf-sparql-query/#GraphPattern supports them. This should be trivial to support. Would you mind filling an issue on it so we can get to it?

tucnak · 2020-01-19T04:15:51Z

Thank you for considering my suggestions.

Just thinking about trailing dot, I believe it should be either enforced, or there shouldn't be no dot at all, for the sole purpose of staying consistent. I will file a corresponding issue straight away and would appreciate if you could give me some directions for implementation: I would love to use it as an entry-point for me to start contributing! (In this comment, I will try to avoid dots, just for the sake of it, really.)

Regarding the code samples you provided, let me share my opinion on them. The @[] punctuation should remain primary, but then again what seems challenging is how desired semantics (such as historically the first, the last items, samples and other possible functions) should relate to it syntactically. The most confusing part for me was the AS semantic, which seems to only have merit if you want to bind predicates, too. Instead, we could just use something like IN and OUT semantics as a part of SELECT statement itself, to avoid clutter elsewhere.

To me, it seems totally natural to assume the latest written predicate by default, so I would re-write your first example simply as

# Predicate can be mentioned via IN for objects, and OUT for subjects.
SELECT IN(?o), ?o
FROM ?some_graph
WHERE {
        /some_type<some_id> "foo"@[] ?o
# or simply
        /some_type<some_id> "foo" ?o
}

# Note: writing IN(?s) should issue an error, because ?s is later bind as subject.
SELECT ?s, OUT(?s)
FROM ?some_graph
WHERE {
        ?s "foo"@[] /some_type<some_id>
# or simply
        ?s "foo" /some_type<some_id>
}

Now, to retrieve all existing time anchors, you should be able to simply use "foo"@[,]. For example, if you wanted to sample only some of the events that happened after certain time, it would make perfect sense to write something like

SELECT IN(?o), ?o
FROM ?some_graph
WITH ?excerpt AS SAMPLE(10) SINCE 2006-01-02T15:04:05
WHERE {
        /some_type<some_id> "foo"@[?excerpt] ?o
}

I couldn't figure out how did you end up with two sets of time anchors (?o1 and ?o2, apparently). Correct me if I'm wrong, but the right logic for the obscure query could look more like this:

SELECT ?p, ?p2, ?t, ?t2
FROM  ?some_graph
WITH ?ten_random AS SAMPLE(10)
WHERE {
        /some_type<some_id> "foo"@[?ten_random] ?p AT ?t
        /some_type<some_id> "bar"@[?t] ?p2 AT ?t2
}

I departed a lot from the existing syntax, but I believe approach like this could significantly reduce clutter in at least some subset of complex requests, and make it much easier to reason about them.

rogerlucena · 2020-11-20T19:42:29Z

I believe this Issue may be closed given what was implemented for the FILTER keyword and the latest FILTER function with #149 and #150.

tucnak · 2020-11-20T23:21:48Z

Thank you, great to see this resolved!

xllora added enhancement feature request labels Dec 6, 2018

xllora added a commit that referenced this issue Dec 6, 2018

Fix typo.

3839024

Address typo introduced during the addition of `LatestAnchor` definition for issue #86.

rogerlucena mentioned this issue Aug 7, 2020

FILTER keyword #129

Open

rogerlucena mentioned this issue Sep 21, 2020

FILTER keyword for latest anchor queries #146

Closed

This was referenced Oct 5, 2020

FILTER keyword for latest anchor queries (grammar/lexer/hooks) #149

Merged

FILTER keyword for latest anchor queries (planner/memory) #150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying only last temporal triple on queries #86

Allow specifying only last temporal triple on queries #86

xllora commented Dec 6, 2018

xllora commented Dec 6, 2018

tucnak commented Dec 26, 2019

xllora commented Jan 8, 2020

tucnak commented Jan 19, 2020 •

edited

Loading

rogerlucena commented Nov 20, 2020

tucnak commented Nov 20, 2020

Allow specifying only last temporal triple on queries #86

Allow specifying only last temporal triple on queries #86

Comments

xllora commented Dec 6, 2018

xllora commented Dec 6, 2018

tucnak commented Dec 26, 2019

xllora commented Jan 8, 2020

tucnak commented Jan 19, 2020 • edited Loading

rogerlucena commented Nov 20, 2020

tucnak commented Nov 20, 2020

tucnak commented Jan 19, 2020 •

edited

Loading