-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alternative links #117
Comments
Proposal from DAL running meeting 19 on 2024-12-11: Introduce a new optional column in the links table with an opaque value that is common to a set of alternatives. new column: example:
In this example, the primary data for id1 is two files and one file is available from two locations: the firstb two links are alternatives (same |
On Wed, Dec 11, 2024 at 09:14:27AM -0800, Patrick Dowler wrote:
Proposal from DAL running meeting 19 on 2024-12-11:
Introduce a new optional column in the links table with an opaque
value that is common to a set of alternatives.
I suspect, too, that this is the least ugly general treatment of the
problem. Still, it *is* ugly. One obvious problem is that with the
multi-ID capability of datalink (you probably expected me to moan
about this again), you would either have to make sure that the
strings somehow are made unique (so that there's no abc with id-1
*and* id-2 at the same time), which will make for long and probably
not very readable alt_keys.
Or, and I think that's what I'd suggest, make alt_key *plus* the id
the group key. I think that's reasonable because no conceivable use
case would require us to form cross-id groups. We could then even
make the alt_key a small integer, because I suspect having it a
string will tempt people to put things in there that really ought to
be in semantics or local_semantics or content_qualifier.
Finally, from a consumer side the simplest solution would be to have
a booleanesque column pick_one that's true if semantics is to be
interpreted as alternatives and false or NULL otherwise. This would,
for all I can see, work for the cases that have been proposed so far.
It would not work if we had multiple equivalence groups per semantics
class. Do we expect that? If we do, would it still be smarter to
define the grouping columns as (id, semantics, alt_key)?
|
I agree that the specifying uniqueness as {ID,alt-key} and that would allow alt_key values to be pretty small and easily generated is a plus. The alternative is generating sufficiently long random keys or using UUID, which are large for no good reason. I agree it is pretty much nonsense to have alternatives across different IDs. I'm not against making it a smallint... I almost wrote in the draft proposal that we require a very short string (like arraysize="4" ish) so was thinking their uniqueness was within a small set of links for the same ID. I don't think the boolean would work for us because we have multiple {ID,semantics} X multiple locations for each, so I think having While it's probably true that uniqueness is within the same semantics as well, but I don't see how that buys us anything and maybe it blocks a specialised use... like two links with wider and narrower semantics that are alternatives??
where progenitor-X is a narrow term under progenitor. I don't have a specific example. |
On Thu, Dec 12, 2024 at 08:42:54PM +0000, Patrick Dowler wrote:
While it's probably true that uniqueness is within the same
semantics as well, but I don't see how that buys us anything and
maybe it blocks a specialised use... like two links with wider and
narrower semantics that are alternatives??
```
ID semantics alt_key ... description
id1 #progenitor alt1 ... "the default progenitor"
id1 #progenitor-X alt1 ... "a more specific progenitor"
```
where progenitor-X is a narrow term under progenitor. I don't have
a specific example.
Hm... I think that if we consider this kind of thing, we should be
explicit about it. You see, the first thing I do when presenting a
datalink result (e.g., https://github.com/msdemlei/datalink-xslt.git)
is group by semantics. I hence would not even know how to express such an
equivalence across the different concepts, and it would certainly
take conscious effort to make whatever we want work. Do we know what
we want? In a display like what https://dc.g-vo.org/shomydl/q/f/form
produces, how would this be shown, if at all?
But that's perhaps a detail that we can work out later. And should
certainly not keep us from drafting some standards language. Pat?
Me? Someone else?
|
I think what we want is to declare which links are alternatives to other links and that should be orthogonal to any other concern. So in principle just grouping by In order to ensure uniqueness of those groups in a large links response (eg many IDs -> many more links) we could make it easier to generate So my position is that for spec simplicity I would fall on the side of "all links with the same value of alt_key are alternatives and the client should chose one rather than use all of them". This accomplishes the goal and places slightly more burden on the implementation to take care when assigning alt_key values, but for the use cases we have I do not think that is hard to do. I will likely prototype this in early January (for the multi-location use case) and I don't think we need to go beyond discussing here until that happens. As for a UI to handle this... it depends on how you want to present the "choices" and allow the user to "chose". I think we just concentrate on conveying the correct information, which is the relationship between 2 or more links in the result. |
On Fri, Dec 13, 2024 at 11:59:19AM -0800, Patrick Dowler wrote:
As for a UI to handle this... it depends on how you want to present
the "choices" and allow the user to "chose". I think we just
concentrate on conveying the correct information, which is the
relationship between 2 or more links in the result.
While I agree that there is not much more to discuss before
implementation, let me disagree here: If you want to add useful
features to protocols, thinking about how the users will consume them
in my experience is the most useful guideline by a wide margin. If
people designed from the user back, it tendend to be a good design.
If people designed from what seemed convenient to data publishers, it
tended to not work out well, not even for the next data publisher.
And that again brings me to a fairly firm impression that we either
say alt_key is per semantics, or semantics is per alt_key (ugh), or
that the two intertwine in... ugh.. ways.
I give you orthogonality is a nice concept, but in pratice clients
have to give a consistent picture, and there semantics and alt_key
simply are not orthogonal. After all, the question that started this
is "what does it mean if there are multiple rows with the same
semantics?"
|
I agree with @msdemlei . The user experience must really be the most important, especially with DataLink which is not very easy/intuitive for users (and implementers). The problem of random strings is that there is no useful meaning for users. But what actually bothers me in this proposed solution is that we have no way to say what are the alternatives proposed to the user ; we have random keys with no meaning (and, so, I agree with Markus that integers would then be easier to generate and to identify by a human being). How the user knows what alternative he/she wants? To answer this question |
The problem is that "alternative links" is completely orthogonal to every other aspect of the links response. It could be a choice between two file formats for the same data, or two locations for the same file. Either way, the client just has to know that there is a choice to be made. In the case of two locations, all the client will see is that the two I prefer Admittedly, this is more complex than it looked at the outset, and there are subtle aspects.... I did have another solution that I have designed and considered (but not implemented) that I did not bring up in the meeting. It is essentially the 3rd class of solutions - service descriptor - which is more general but less optimal in terms of number of requests required... maybe we are attempting premature optimization and should consider it. I will post that idea separately. |
So, if Use case 1: links that return different format Use case 2: links to different storage locations && Use case 3: links with different transport protocols Now, I'm not saying this is the simplest to implement but it is more robustly and clearly specified than trying to wedge alternatives into the links response. |
On Mon, Dec 16, 2024 at 10:11:58AM -0800, Patrick Dowler wrote:
So, if `alt_key` is too complicated the other option is **service
No, I don't think alt_key is too complicated. I think we just need
to acknowledge that it means "rows that have the same (id, semantics,
alt_keys) triple are alternatives". Basically, it's just a
completely normal GROUP BY. And I highly prefer it to going through
service descriptors because it's more visible and works nicely even
when the client does not explicitly support alt_key.
|
What I'd like with the I also agree with @msdemlei : service descriptor are relatively hidden and having one more steps to get more ways to get the data seems too discouraging in terms of UX. I agree too that |
OK, just showing that the alternative looks like. Yes, service descriptors are more complex from a usage/UX point of view. No argument there :-) My disagreement is that I think it should just be "rows with the same ID" and not include semantics, because alternatives in general could have different semantics. I am specifically thinking of wider or narrower terms but there could be other scenarios where the service wants the client to chose. I think the only disagreement is including or not including semantics in specifying the alt grouping. I don't see why we would restrict this unnecessarily when it is well defined either way. So, can we not just start with "rows that have the same ID,alt_key are alternatives"? That makes it easy enough to implement and does not restrict usage... prototypes will tell if we need more. |
On Tue, Dec 17, 2024 at 09:23:03AM -0800, Patrick Dowler wrote:
including or not including semantics in specifying the alt
grouping. I don't see why we would restrict this unnecessarily when
it is well defined either way.
If "Well defined" includes "clients will know what to do", I'd mildly
dispute the "well defined" part :-)
But:
So, can we not just start with "rows that have the same ID,alt_key
are alternatives"? That makes it easy enough to implement and does
not restrict usage... prototypes will tell if we need more.
Well, let's start with that. But if no credible use case surfaces
for "alt_key across semantics" until we're through, let's seriously
reconsider that decision; as I said, I wouldn't know how to even show
such a thing in my Datalink XSLT/js, and I doubt anyone else trying
to do anything sensible with semantics will.
|
current status/decision: We will go ahead with prototyping an optional We will revisit the question of uniqueness and whether or not it should also include semantics (so alternatives have to have the same ID and semantics) later once we have some experience and are more informed by use cases. @pdowler will create a PR to update doc status to WD-DataLink-1.2 and document the current alt_key idea (probably a separate PR once we're back in WD). |
In the links response, the provider may want to convey multiple URLs to the same content. There is currently no mechansim to tell clients that these links are altermatives and it should chose one of them.
Use case 1: links that return different format
Use case 2: links to different storage locations
Use case 3: links with different transport protocols
The text was updated successfully, but these errors were encountered: