-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete content_type subsection. #42
Comments
The PR discussion brought up the issue that mime type parameters really should be defined by the same authority that defines the mime type, so the idea of adding content={dataproduct_type} (or some other ivoa vocabulary value) to application/fits (eg) now seems unacceptable. The next best alternative would be to add a new (optional in 1.1) field, say Detail: do we define a default vocabulary and allow terms from that to be used "unqualfiied" (eg So, do we allow bare vocabulary terms from any (IVOA) vocabulary -- image (or #image) and galaxy (or #galaxy) -- or fully qualified vocabulary terms (identifiers)? |
I can volunteer to write this and create a PR, but I'd like to wait for PR #50 because that introduces optional fields and this would definitely create a merge conflict if done in parallel. |
I always cringe when "about the same thing" is done differently in two
There's http://www.ivoa.net/rdf/product-type that ought to become
This is a bit tricky -- for internal (datalink) consistency, I'd say we Given that's what datalink does elsewhere, I'd say we can't really do it If we started from scratch, I'd not do it this way again and instead say This is because I'm now convinced that hierarchy-aware matching |
Hi all,
Well, I tend to agree with Pat there. I think we have to be cautious about adding new columns all the time in the future. So having two qualify the content of the link independantly from its relation to "#this" (content_type and content_qualifier) should be enough. We will still have plenty of use cases wher we will not use a dataproduct_type to qualify the target because it's simply inappropriate. But if the target is voevent the content can be a classification tag of that voevent or if the semantics is "metadata" the content_qualifier could tell us : "provenance" record, obscore record, ssa record, proprietary, etc...
so http://www.ivoa.net/rdf/product-type as to be the default namespace for this field (stated so in the spec or advertized "à la" xsd namespace at the beginning of the VOTable) So anything which is not a dataproduct_type from the iVOA vocab has to contain an explicit namespace
|
For a while I also volunteered to write this one \subsubsection{content_qualifier} The content_qualifier column is optional. If it is present, it tells the client the nature of the thing or service they will receive or access if they use the link, in other words the target. If the target is a dataproduct, the field SHOULD contain one of the terms defined in the IVOA dataproduct_type vocabulary, considered as the default vocabulary. For other natures of the target the field MAY contain a term defined in another IVOA or proprietary vocabulary refered by its URI. \subsection{Successful Requests} |
I eventually created the PR for the small subsection because it is not in conflict with the table where optional FIELDS will be listed. See discussion on this PR#50 |
Coming back to this now that the other PRs are merged. Before we discuss the name of the column in the links table, the more fundamental question is whether there is a single vocabulary which defines the values or are there several, some of which have not been created yet?
I'm not sure what you mean by "do it this way again". (I thought) I understand that there are two orthogonal vocabulary concepts:
Are you saying you don't like allowing terms from multiple vocabularies in this new "content_qualifier" column? (using #1) If so, the "fooling around" is caused by allowing unqualified bare terms from a default vocabulary. Are you saying you don't like extensions of a single-mandated vocabulary? (using #2). If so, the "fooling around" (like in semantics column) is because we allowed the unqualified bare terms #this which seemed kind of cute at the time. Maybe we don't have a well specified way for people to declare and use extensions but that's really important so people can put prototype terms into use... I just don't see how the product-type vocabulary can satisfy all the use cases and I don't see adding a column for each new vocabulary, so: My position right now:
We (CADC) use fully qualified terms in semantics that are not in the core vocab; I consider them prototype in nature and just haven't got around to the VEP stage. |
On Wed, May 12, 2021 at 06:14:16PM -0700, Patrick Dowler wrote:
Coming back to this now that the other PRs are merged. Before we
discuss the name of the column in the links table, the more
fundamental question is whether there is a single vocabulary which
defines the values or are there several, some of which have not
been created yet?
The most important thing is to get the use case clear. I'm assuming
it's "route links through SAMP". If we're thinking of anything else,
this would be the moment to say so (and properly define it: "A client
wants to do X").
Are you saying you don't like allowing terms from multiple
vocabularies in this new "content_qualifier" column? (using #1) If
I don't like pretending you can post your private vocabulary
somewhere and terms in it will work as well as if they were defined
in the IVOA vocabulary. While RDF would make that possible (because
you can define relationships between vocabularies), all kinds of
technicalities make that completely unrealistic in practice (I'll
elaborate if you want).
That is: We have the choice between allowing resources from all over
the place ("full URIs", which then in effect are opaque strings) and
concept trees with proper metadata (labels, description,
preliminary/deprecated flags).
In that choice, the trees and metadata to me are overwhelmingly
more important in almost all relevant use cases for consensus
vocabulaires. And hence I propose as a good default policy: When you
have a field with values from a controlled vocabulary, say "terms are
from http://www.ivoa.net/rdf/thisvoc". You can always say "prefix by
an x- for something experimental that won't resolve" or so; in
practice, people will just fall back to something ugly for unknown
terms anyway, and you won't be in the hierarchy -- that's reasonably
sensible behaviour.
Are you saying you don't like extensions of a single-mandated
vocabulary? (using #2). If so, the "fooling around" (like in
They're built to be extended; VEPs are supposed to be cheap exactly
to make people extend soon and extend early.
#this which seemed kind of cute at the time. Maybe we don't have a
well specified way for people to declare and use extensions but
that's really important so people can put prototype terms into
use...
...and of course you can always just stick in terms illegaly for
*really* early prototyping and bear with the consequences laid out
above.
I just don't see how the product-type vocabulary can satisfy all
the use cases and I don't see adding a column for each new
What additional use cases are these? Me, I think columns usually
should be per-use case (when these use cases are sufficiently
different, of course). Having columns be useful sometimes for A and
sometimes for B in my experience ends up making them not useful for
either A and B -- and that's independent of the question of
vocabularies.
Having said that, having some wild "twitter-like tags" obviously is
a valid use case in microblogs. Perhaps they may work for datalinks
as well (I'd need some serious convincing here, though). In that
case, though, I think one would find that these things don't need any
sort of vocabulary and work, twitter-like, by spontaneous agreement
of certain subcommunities.
My position right now:
* semantics continues to mandate the single vocabulary, therefore
unqualified terms are allowed
* content_qualifier (not in love with the name) allows fully
qualified terms from any vocabulary; no default; I could possibly
get behind restricting to "any ivoa vocabulary", depending on your
position on extensions
But how would that solve the (IMHO valid) SAMP routing use case?
Note that hierarchy plays a major role there, as a single client
might handle an entire branch of product types.
|
OK, I get the objection to the wild west of arbitrary full URLs to something on the internet; I don't think it would magically work either and they are just opaque identifiers to s/w (a human could go get the definition of a term). I re-read what I think is the original post on this (issue #44) and in there I noted a couple of rather simple things that maybe are enough to get by for some time. First, there is a (proposed) "tabular" or "table" value in the product-type vocabulary; assuming such a VEP was accepted this would nominally be the way to link to "records" (query results). If you saw a links response with: id semantics product_type content_type ... You could infer that the second link was to a fits file with a table in it, but does #derivation tell you what's in the table? what is a row in that table? is it clear that it is an extracted source? if not, how could we make that clear? The answer could be a narrower term than #derivation that said something about what kind of derivation: same data but processed to be "better" vs information extracted vs astronomical sources extracted ... So I guess if both datalink/core and product-type vocabularies grow sufficiently, aren't too rigid and don't become a huge mess then we'd be OK with a product_type column restricted to values from that vocabulary. The combinations from two vocabularies will make this quite flexible... I suspect 3 such things would be too much. Francois - do you think this will work for the use cases from Ada and others you mentioned? Aside: At CADC we have a handful of astronomers and data-scientists that use our services a lot; they are pseudo-representative of the community (pseudo because they know too much now). I am keenly aware of how much they hate it when things change and if you give them something simple they get used to you can never go back and generalize it in a way that makes it more complex. As a result, I am extremely leery of simple-looking things that look like short cuts unless I have sketched out the general solution and I know the shortcut is not going to bite me later. So like Markus, I don't think I grok the general problem here (lack of use cases) and that makes me a little worried that we'll regret something. OTOH, if we just think about it as "used to be able to say one thing about a link" and "now you can say two things about a link" then that helps. |
well there are two level of answers : ---> could we find a more generic term than product_type for describing the nature of the #link. (I understand that content_qualifier is ruled out)
|
As long as the product-type vocabulary, which says "what something is" expands to include terms beyond what ObsCore uses (different kinds of science data) it could be a general purpose way to augment the content_type. The level 3 and 4 examples above are both using terms from the datalink/core vocabulary; it could be that we have created some confusion with the content of that vocabulary... is there a use case where you would want to specify one of those level 3 and one of those level 4 terms? If so, is is feasible to split the datalink/core vocab into two actually distinct vocabularies (I'm skeptical)? what about simply allowing multiple terms to be used to describe a link that has a complicated multi-faceted relationship to #this? I do in fact have a use case that suggests this and I don't want to get that mixed up with use of product-type, but in general being able to put multiple terms might be an alternative. On the aside: the "simple thing" I am potentially nervous about is being strict about product_type column being just for terms from the product-type vocab, and then future evolution of that vocab is also strict and not being able to use it for other use cases. The other obvious thing I could see doing is linking to an instance of a data model and for that I'd expect to say content_type=aplication/x-votable+xml product_type="instance(s) of ObsCore" or something like that. So do we eventually add a base term "model" and narrower terms like "ObsCore" and "Source" and "Cube" to the product-type vocabulary? We could go that way and I'd feel a lot better about adding a strict product_type column to links now if I heard "heh, that sounds cool - we could do some VEPs for that in the near future". |
On Thu, May 20, 2021 at 08:23:28AM -0700, Bonnarel wrote:
2 ) - when #link is not a dataproduct product_type is useless.
It is not a problem per se. we can leave it empty. But maybe we
want to say more about what it is in that case. Imagine #link is
"Documentation". Is that a tutorial ? a refered article ? a
simple html page ? a github repository ? Where do we put this
information if the new field is reserved for dataproduct_type
vocabulary ?
I'd say the right way to go about answering this question is to
figure out: What client is supposed to consume this information, and
what is it going to do with it? Following established use in
linguistics, I'd call this the pragmatics of the field (cf.
https://en.wikipedia.org/wiki/Pragmatics).
Once we've understood that, we'll have a much better chance of
figuring out if (and how) a product_type column works for the use
case or if this needs to be addressed in some other way, and what
kind of semantics should be put in place.
Incidentally, the "other way" could also include "allow terms from a
second vocabulary". Since both of them would be controlled, we can
guarantee that there are no collisions between the terms from the two
vocabularies, and clients could, by inspecting what vocabulary a term
comes from, even figure out if something is a product-type, a (say)
documentation-type or just the odd out-of-vocabulary thing that you
always have to reckon with.
I'm not saying that's a good idea here -- as I said, we first have to
figure out exactly what the pragmatics of whatever you're after here
is. But it *might* be a good idea.
|
Le 20/05/2021 à 23:13, Patrick Dowler a écrit :
As long as the product-type vocabulary, which says "what something is"
expands to include terms beyond what ObsCore uses (different kinds of
science data) it could be a general purpose way to augment the
content_type. product-type vocabulary
Which means that ObsCore will only use a reduced part of the new dataproduct_type vocabulmary
The level 3 and 4 examples above are both using terms from the
datalink/core vocabulary; it could be that we have created some
confusion with the content of that vocabulary... is there a use case
where you would want to specify one of those level 3 and one of those
level 4 terms? If so, is is feasible to split the datalink/core vocab
into two actually distinct vocabularies (I'm skeptical)? what about
simply allowing multiple terms to be used to describe a link that has
a complicated multi-faceted relationship to #this? I do in fact have a
use case that suggests this and I don't want to get that mixed up with
use of product-type, but in general being able to put multiple terms
might be an alternative.
For level 3 and 4 I may differ from Ada, so we would have to poke her to
know what she actually meant. To me both 3 and 4 were actually
qualifying the relationship between #this and #link. It's actually
splitting the actual semantics fields in two parts. And I was afraid
that if we add a 4th field to tackle content_type, product_type,
relationship and information we will end with most of them empty in
many use cases. I imagined that we should wait for a recommended
datamodel annotation mechanism to try to solve this by adding an
annotation on top of the current table.
the use case I see is the one discussed in VEP006. Imagine we have a new
relationship term "ancestor" a term of wider extent than progenitor (see
VEP006 discussion) able to encompass #dark, #flat fields, used in
calibration process to obtain #this etc as well as #progenitors. Then we
could have both ancestor and dark ? While calibration and dark could be
for dark file which can be used for calibrating the current #this?
But what are the other consequences of having two terms in the semantic
field ?
On the aside: the "simple thing" I am potentially nervous about is
being strict about product_type column being just for terms from the
product-type vocab, and then future evolution of that vocab is also
strict and not being able to use it for other use cases. The other
obvious thing I could see doing is linking to an instance of a data
model and for that I'd expect to say
content_type=aplication/x-votable+xml product_type="instance(s) of
ObsCore" or something like that. So do we eventually add a base term
"model" and narrower terms like "ObsCore" and "Source" and "Cube" to
the product-type vocabulary? We could go that way and I'd feel a lot
better about adding a strict product_type column to links now if I
heard "heh, that sounds cool - we could do some VEPs for that in the
near future".
in principle I agree with your concern. But ObsCore and Source will not
relate to the same semantics value. ObsCore should be metadata (as
provenance), while Source could be "derived" or "target" if it existed.
…
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMP5LTGQKX5RCIBJK4EAM4DTOV3PLANCNFSM4MMTKBBA>.
|
Le 21/05/2021 à 09:17, msdemlei a écrit :
On Thu, May 20, 2021 at 08:23:28AM -0700, Bonnarel wrote:
> 2 ) - when #link is not a dataproduct product_type is useless.
> It is not a problem per se. we can leave it empty. But maybe we
> want to say more about what it is in that case. Imagine #link is
> "Documentation". Is that a tutorial ? a refered article ? a
> simple html page ? a github repository ? Where do we put this
> information if the new field is reserved for dataproduct_type
> vocabulary ?
I'd say the right way to go about answering this question is to
figure out: What client is supposed to consume this information, and
what is it going to do with it? Following established use in
linguistics, I'd call this the pragmatics of the field (cf.
https://en.wikipedia.org/wiki/Pragmatics).
Something very basic : give a more accurate characterization of what
kind of documentation is going to be retrieved.
At the DataLink table display level this can be only for selection of
lets' say "references"
When you retrieve it could be also used to annouce the output nature on
a retrieval page.
Once we've understood that, we'll have a much better chance of
figuring out if (and how) a product_type column works for the use
case or if this needs to be addressed in some other way, and what
kind of semantics should be put in place.
Incidentally, the "other way" could also include "allow terms from a
second vocabulary". Since both of them would be controlled, we can
guarantee that there are no collisions between the terms from the two
vocabularies, and clients could, by inspecting what vocabulary a term
comes from, even figure out if something is a product-type, a (say)
documentation-type or just the odd out-of-vocabulary thing that you
always have to reckon with.
I'm not saying that's a good idea here -- as I said, we first have to
figure out exactly what the pragmatics of whatever you're after here
is. But it *might* be a good idea.
So, my proposal was : by default dataproduct_type, and if needed full
vocabulary term with URI
Pat's proposal is : extend the scope of the dataproduct_type
vocabulary
Your proposal = let the client recognize the terms coming from
different IVOA vocabularies. Hopefully there are no possible confusion ?
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMP5LTCWLG562ZBKQHBQX73TOYCILANCNFSM4MMTKBBA>.
|
On Sun, May 23, 2021 at 02:05:50PM -0700, Bonnarel wrote:
Le 21/05/2021 à 09:17, msdemlei a écrit :
> I'd say the right way to go about answering this question is to
> figure out: What client is supposed to consume this information, and
> what is it going to do with it? Following established use in
> linguistics, I'd call this the pragmatics of the field (cf.
> https://en.wikipedia.org/wiki/Pragmatics).
Something very basic : give a more accurate characterization of what
kind of documentation is going to be retrieved.
Sure -- but wouldn't a client just send the URL to a web browser
regardless of that more accurate characterisation? If so, I don't
see a need for machine-readable information, and whatever is in
description is enough (because the recipient is a human).
If the intended pragmatics are different (i.e., it's not about
"figure out where to send this link"), we may need something else,
but as I said we'd first need to figure out that other pragmatics.
|
On Sun, May 23, 2021 at 01:41:37PM -0700, Bonnarel wrote:
the use case I see is the one discussed in VEP006. Imagine we have a new
relationship term "ancestor" a term of wider extent than progenitor (see
VEP006 discussuion) able to encompass #dark, #flat fields, used in
calibration process to obtain #this etc as well as #progenitors. Then we
could have both ancestor and dark ? While calibration and dark could be
for dark which can be used for calibration ?
Well, this problem immediately goes away when we correctly construct
the vocabulary, which is what the discussion on VEP-006 is all about:
When the vocabulary is a tree, either dark ⊂ ancestor, in which case
there's not need to give it (all dark-s are ancestor-s, and the
machine knows it), or dark ∩ ancestor = ∅, in which case it cannot be
both.
It *is* a bit of an effort to construct vocabularies that way (as
evinced by the VEP-006 discussion), but the payoff is that machines
can figure out these things, and that's a huge payoff given that
proper annotation is difficult for humans.
|
Well , I think it could be interesting to allow some combination of terms in semantics. I seem some use cases as I explained on the semantics mailing list for the VEP006 discussion. But if it is to be useful for clients, should we not restrict the allowed combinations to some predefined list ?
|
optional content_qualifier field added in PR 57 to resolve this issue. |
great !!!
Le 14/10/2021 à 18:13, Patrick Dowler a écrit :
…
optional content_qualifier field added in PR 57 to resolve this issue.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMP5LTG6DWIWEV4ZWDDACX3UG36STANCNFSM4MMTKBBA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Recent semantic discussion addressed the use case of adding the possibility to link sibling or alternate science datasets to the main item. Eventually the right place to specify the dataproduct_type of the datasets has been decided to be a standardized media type parameter in the content_type FIELD. this has to be explained in the section. See PR #43
The text was updated successfully, but these errors were encountered: