Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alternative links #117

Open
pdowler opened this issue Dec 11, 2024 · 14 comments
Open

alternative links #117

pdowler opened this issue Dec 11, 2024 · 14 comments
Labels

Comments

@pdowler
Copy link
Collaborator

pdowler commented Dec 11, 2024

In the links response, the provider may want to convey multiple URLs to the same content. There is currently no mechansim to tell clients that these links are altermatives and it should chose one of them.

Use case 1: links that return different format

Use case 2: links to different storage locations

Use case 3: links with different transport protocols

@pdowler pdowler added the TBD label Dec 11, 2024
@pdowler
Copy link
Collaborator Author

pdowler commented Dec 11, 2024

Proposal from DAL running meeting 19 on 2024-12-11:

Introduce a new optional column in the links table with an opaque value that is common to a set of alternatives.

new column: name="alt_key" datatype="char" arraysize="..."

example:

ID  semantics  access_url                                          alt_key
----------------------------------------------------------------------------------
id1 #this        https://zone1.example.net/file1                  abc
id1 #this        https://zone2.example.net/file1                  abc
id1 #this        https://zone1.example.net/file2

In this example, the primary data for id1 is two files and one file is available from two locations: the firstb two links are alternatives (same alt_key) and the 3rd link is independent.

@msdemlei
Copy link
Collaborator

msdemlei commented Dec 12, 2024 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 12, 2024

I agree that the specifying uniqueness as {ID,alt-key} and that would allow alt_key values to be pretty small and easily generated is a plus. The alternative is generating sufficiently long random keys or using UUID, which are large for no good reason. I agree it is pretty much nonsense to have alternatives across different IDs.

I'm not against making it a smallint... I almost wrote in the draft proposal that we require a very short string (like arraysize="4" ish) so was thinking their uniqueness was within a small set of links for the same ID.


I don't think the boolean would work for us because we have multiple {ID,semantics} X multiple locations for each, so I think having alt_key orthogonal is the best general approach.


While it's probably true that uniqueness is within the same semantics as well, but I don't see how that buys us anything and maybe it blocks a specialised use... like two links with wider and narrower semantics that are alternatives??

ID     semantics        alt_key ... description
id1   #progenitor       alt1    ...   "the default progenitor"
id1   #progenitor-X     alt1   ... "a more specific progenitor"

where progenitor-X is a narrow term under progenitor. I don't have a specific example.

@msdemlei
Copy link
Collaborator

msdemlei commented Dec 13, 2024 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 13, 2024

I think what we want is to declare which links are alternatives to other links and that should be orthogonal to any other concern. So in principle just grouping by alt_key is sufficient to make a small set of links where the client choses 1.

In order to ensure uniqueness of those groups in a large links response (eg many IDs -> many more links) we could make it easier to generate alt_key values by saying they only have to be unique within a specific ID value. That makes the spec a little more complicated and maybe makes the implementation easier, but honestly if alt_key is a string I would just chose to generate random string codes of length 8-ish and be confident it would be OK. While we could certainly say that is the expectation, I don't think it really makes it easier to work with so I would avoids the complexity and unforeseen consequences.

So my position is that for spec simplicity I would fall on the side of "all links with the same value of alt_key are alternatives and the client should chose one rather than use all of them". This accomplishes the goal and places slightly more burden on the implementation to take care when assigning alt_key values, but for the use cases we have I do not think that is hard to do.

I will likely prototype this in early January (for the multi-location use case) and I don't think we need to go beyond discussing here until that happens.

As for a UI to handle this... it depends on how you want to present the "choices" and allow the user to "chose". I think we just concentrate on conveying the correct information, which is the relationship between 2 or more links in the result.

@msdemlei
Copy link
Collaborator

msdemlei commented Dec 16, 2024 via email

@gmantele
Copy link
Contributor

I agree with @msdemlei . The user experience must really be the most important, especially with DataLink which is not very easy/intuitive for users (and implementers).

The problem of random strings is that there is no useful meaning for users. But what actually bothers me in this proposed solution is that we have no way to say what are the alternatives proposed to the user ; we have random keys with no meaning (and, so, I agree with Markus that integers would then be easier to generate and to identify by a human being). How the user knows what alternative he/she wants? To answer this question semantics and local_semantics should be enough to answer this question. So, as François B. suggested, maybe we already have our solution here: local_semantics could then be used to make links with the same semantics alternatives to each other. Then, there would be no "randomly generated and meaningless" grouping keys. Is not it enough? Or have I missed something?

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 16, 2024

The problem is that local_semantics is a mechanism that already means something else and links with different ID values already have the same local_semantics value and that means something specific to a user interface that is helping a user pick links. So no, it is not a solution.

"alternative links" is completely orthogonal to every other aspect of the links response. It could be a choice between two file formats for the same data, or two locations for the same file. Either way, the client just has to know that there is a choice to be made. In the case of two locations, all the client will see is that the two access_url values are different: they won't have any good reason to prefer one over the other, but at least they will know to not download both.

I prefer alt_key to have no meaning because if it does, even something implied like using integer instead of random string, people will make assumptions (order, rank, whatever).

Admittedly, this is more complex than it looked at the outset, and there are subtle aspects.... I did have another solution that I have designed and considered (but not implemented) that I did not bring up in the meeting. It is essentially the 3rd class of solutions - service descriptor - which is more general but less optimal in terms of number of requests required... maybe we are attempting premature optimization and should consider it. I will post that idea separately.

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 16, 2024

So, if alt_key is too complicated the other option is service descriptors. The pro: it's a more general solution that can take advantage of other existing tech. The cons: clients that grok the semantics of the service descriptor need to make additional requests and in most cases different kinds of alternatives imply different kinds of services.

Use case 1: links that return different format
For this use case, the natural way to proceed would be a service that honours the http Accepts header and can return the content in different formats (or maybe use DALI RESPONSEFORMAT param?). The service could provide access to existing files or perform content transformation on-the-fly.

Use case 2: links to different storage locations && Use case 3: links with different transport protocols
For this use case, the transfer negotiation API in VOSpace does exactly this. The client specifies an identifier for the target data/file and a set of transfer protocols it knows how to use and the server returns a set of URLs for all the locations and protocols available. This is a proven mechanism, maybe a little dated but easily updated to support request/response in something other than xml. More specifically, it could more naturally allow clients to say "I know how to use S3" in the request and the server to say "here is the S3 object identifier" in the response (details TBD).

Now, I'm not saying this is the simplest to implement but it is more robustly and clearly specified than trying to wedge alternatives into the links response.

@msdemlei
Copy link
Collaborator

msdemlei commented Dec 17, 2024 via email

@gmantele
Copy link
Contributor

What I'd like with the local_semantics solution is that it is a custom vocabulary and then it does not mean adding a column. However, I completely agree that this field already has a meaning which makes it hard to use it for something like data localisation or transport protocol. So, OK local_semantics is not a good solution.

I also agree with @msdemlei : service descriptor are relatively hidden and having one more steps to get more ways to get the data seems too discouraging in terms of UX.

I agree too that alt_key seems like a better solution, but....it is ugly. It looks like a ugly trick to do something that Datalink is not currently able to do, while it should do so. Especially, I don't like the fact that there is a content that may mean something for humans but is actually meaningless for the machine unless you want to group. But, I don't have a better alternative to propose yet except integer values (although it may mean, as you said, ordering, priority, ...) ....

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 17, 2024

OK, just showing that the alternative looks like. Yes, service descriptors are more complex from a usage/UX point of view. No argument there :-)

My disagreement is that I think it should just be "rows with the same ID" and not include semantics, because alternatives in general could have different semantics. I am specifically thinking of wider or narrower terms but there could be other scenarios where the service wants the client to chose. I think the only disagreement is including or not including semantics in specifying the alt grouping. I don't see why we would restrict this unnecessarily when it is well defined either way.

So, can we not just start with "rows that have the same ID,alt_key are alternatives"? That makes it easy enough to implement and does not restrict usage... prototypes will tell if we need more.

@msdemlei
Copy link
Collaborator

msdemlei commented Dec 18, 2024 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Dec 18, 2024

current status/decision:

We will go ahead with prototyping an optional alt_key column in the links response to group multiple alternative links with the same ID. This means that {ID,alt_key} specifies a group.

We will revisit the question of uniqueness and whether or not it should also include semantics (so alternatives have to have the same ID and semantics) later once we have some experience and are more informed by use cases.

@pdowler will create a PR to update doc status to WD-DataLink-1.2 and document the current alt_key idea (probably a separate PR once we're back in WD).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants