Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying only last temporal triple on queries #86

Open
xllora opened this issue Dec 6, 2018 · 6 comments
Open

Allow specifying only last temporal triple on queries #86

xllora opened this issue Dec 6, 2018 · 6 comments

Comments

@xllora
Copy link
Member

xllora commented Dec 6, 2018

There is a common use case were temporal triples are use to maintain time series. However, common queries just require to get the last triple in the time series, not all the anchored versions. The goal of this issue is to track the modifications required to do so.

@xllora
Copy link
Member Author

xllora commented Dec 6, 2018

The initial work only focus on the modifications required on the storage layer to make it possible.

xllora added a commit that referenced this issue Dec 6, 2018
This change starts the work on issue #86. The initial work focus on the
modifications required on the storage layer by the introduction of a new
flag `LatestAnchor` to `storage.LookupOptions`. This will signal the
drivers to only use the latest incarnation of a temporal triple.

This change also contains the modifications to the volatile in memory
default driver to implement the added semantics by `LatestAnchor`.
xllora added a commit that referenced this issue Dec 6, 2018
Address typo introduced during the addition of `LatestAnchor` definition
for issue #86.
@tucnak
Copy link

tucnak commented Dec 26, 2019

Hello! Is there a timeline on any of these issues? BQL looks great, and kind of natural if you are coming from some SQL/SPARQL place, but the killer feature that is anchoring seems off-the-mile just enough to throw the whole thing off.

One case which I would love to make for something like BQL would be to track user actions, so easy analytics over of the whole user base experience becomes possible, but then again you would expect more flexibility in how requests are made. For certain queries you would need to specify a timeframe, and maybe some limits (sampling?) and ideally you would want to group different anchor points by their predicate in the output, so it's then easier to deal with it in the application.

I believe I could be missing something, but from what I've read, seen in the code, and tried online—it doesn't seem possible to be doing any of this at all. Unfortunately, it's not clear how to do this in cayley either. If only the long-awaited crossover episode did happen with at least some of this functionality in place, it would instantly make efficient temporal modeling over arbitrary backends not only possible, but quite easy actually. (postgres!! I believe it's quite good at nosql as well as hstore?)

P.S. It's a great pity grammatically that BQL doesn't support a trailing dot . for queries with multiple clauses.

@xllora
Copy link
Member Author

xllora commented Jan 8, 2020

Hi @tucnak! Thanks for sharing your thoughts.

The changes to get latest are in place for temporal predicates. The part that we have not decide is how to expose it in to BQL. The naive way would go with something along the lines of

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[LATEST]  as ?p ?o
}

This would allow to introduce other semantics (e.g. FIRST) or primitives that require extra info like your point on sampling (e.g. SAMPLE(10)). These would require also some costraining on what you can pass on time bounded queries.

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[,]  AS ?p ?o
}

Would be equivalent to

SELECT ?p, ?o
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[FIRST,LATEST]  AS ?p ?o
}

This could also allow some advance, more obscure querying like

SELECT ?p1, ?o1, ?p2, ?o2
FROM  ?some_graph
WHERE {
    /some_type<some_id> "foo"@[SAMPLE(10)]  AS ?p1 AT ?t  ?o1 .
    /some_type<some_id> "bar"@[?t]  AS ?p2 AT ?t  ?o1 
}

Which would allow to properly retrieve related entries to the returned samples.

This requires a bit of a change and we want to collect enough use cases before we make a proposal. Do you have any example that you could share? This would help us iron out the native proposal sketched above.

P.S. The trailing dot .. Mmh. https://www.w3.org/TR/rdf-sparql-query/#GraphPattern supports them. This should be trivial to support. Would you mind filling an issue on it so we can get to it?

@tucnak
Copy link

tucnak commented Jan 19, 2020

Thank you for considering my suggestions.

Just thinking about trailing dot, I believe it should be either enforced, or there shouldn't be no dot at all, for the sole purpose of staying consistent. I will file a corresponding issue straight away and would appreciate if you could give me some directions for implementation: I would love to use it as an entry-point for me to start contributing! (In this comment, I will try to avoid dots, just for the sake of it, really.)

Regarding the code samples you provided, let me share my opinion on them. The @[] punctuation should remain primary, but then again what seems challenging is how desired semantics (such as historically the first, the last items, samples and other possible functions) should relate to it syntactically. The most confusing part for me was the AS semantic, which seems to only have merit if you want to bind predicates, too. Instead, we could just use something like IN and OUT semantics as a part of SELECT statement itself, to avoid clutter elsewhere.

To me, it seems totally natural to assume the latest written predicate by default, so I would re-write your first example simply as

# Predicate can be mentioned via IN for objects, and OUT for subjects.
SELECT IN(?o), ?o
FROM ?some_graph
WHERE {
        /some_type<some_id> "foo"@[] ?o
# or simply
        /some_type<some_id> "foo" ?o
}

# Note: writing IN(?s) should issue an error, because ?s is later bind as subject.
SELECT ?s, OUT(?s)
FROM ?some_graph
WHERE {
        ?s "foo"@[] /some_type<some_id>
# or simply
        ?s "foo" /some_type<some_id>
}

Now, to retrieve all existing time anchors, you should be able to simply use "foo"@[,]. For example, if you wanted to sample only some of the events that happened after certain time, it would make perfect sense to write something like

SELECT IN(?o), ?o
FROM ?some_graph
WITH ?excerpt AS SAMPLE(10) SINCE 2006-01-02T15:04:05
WHERE {
        /some_type<some_id> "foo"@[?excerpt] ?o
}

I couldn't figure out how did you end up with two sets of time anchors (?o1 and ?o2, apparently). Correct me if I'm wrong, but the right logic for the obscure query could look more like this:

SELECT ?p, ?p2, ?t, ?t2
FROM  ?some_graph
WITH ?ten_random AS SAMPLE(10)
WHERE {
        /some_type<some_id> "foo"@[?ten_random] ?p AT ?t
        /some_type<some_id> "bar"@[?t] ?p2 AT ?t2
}

I departed a lot from the existing syntax, but I believe approach like this could significantly reduce clutter in at least some subset of complex requests, and make it much easier to reason about them.

@rogerlucena
Copy link
Contributor

I believe this Issue may be closed given what was implemented for the FILTER keyword and the latest FILTER function with #149 and #150.

@tucnak
Copy link

tucnak commented Nov 20, 2020

Thank you, great to see this resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants