Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{links} response content-type header #91

Closed
Bonnarel opened this issue Jan 9, 2023 · 10 comments · Fixed by #93
Closed

{links} response content-type header #91

Bonnarel opened this issue Jan 9, 2023 · 10 comments · Fixed by #93

Comments

@Bonnarel
Copy link
Contributor

Bonnarel commented Jan 9, 2023

From Markus Demleitner today 👍
there's one skeleton in the closet
that recently came up again, and perhaps we can still somehow bury
it before RFC.

The problem is the following text:

Unless the incoming request included a RESPONSEFORMAT parameter
requesting a different format, the content-type header of the
response MUST be application/x-votable+xml'' with the content''
parameter set to ``datalink'', with the canonical form given in
\ref{sec:mime} strongly recommended.

The purpose of this language is that clients can (relatively) easily
work out that they are dealing with a Datalink document regardless of
where they get it from (as long as it's http). I think that's a good
idea, although I'm not aware of a client that actually looks at
content-type when retrieving things that could be datalink documents.

But at the same time this is blocking an important use case:
Displaying datalink documents in the browser (Background:
http://mail.ivoa.net/pipermail/dal/2021-April/008426.html and
https://github.com/msdemlei/datalink-xslt). When I wrote the XSLT
for that in ~2016, I planned it as a temporary hack until there are
good datalink clients, but now I think letting people open datalinks
with the browser and getting something actually usable is a major use
case in itself.

The trouble with this: Web browsers will not apply the XSLT to
documents with a media type of
application/x-votable+xml;content=datalink. I have to give them
text/xml to start the whole magic.

I hence at the moment have the choice of violating the standard or
breaking a use case important to me. I weaseled around that first by
inspecting user agent strings and only returning text/xml if the user
agent looked as if I was dealing a web browser, praying nobody would
notice. But that broke rather quickly (I forget the details), and I
switched to inspecting the accept header. If I find a text/html in
there, I return text/xml (yeah, it's that twisted), otherwise I'm
compliant with the datalink spec.

But it's still a violation of the standard. I had hoped programmatic
use would not be impacted, but it turns out that, for instance, the
JVMs earlier than 11 actually indicate acceptance of text/html, too.
Sigh.

So... it's trouble, and I have not found any solution that doesn't
make me cringe. But I increasingly have the impression that ignoring
the problem will only make matters worse.

The least horrible proposal I have would be to replace the text
quoted above:

When a datalink service returns a datalink VOTable (i.e., absent a
RESPONSEFORMAT parameter requesting something else), it MUST
indicate that in the response's content-type header. When the
request's accept header includes application/x-votable+xml'', then it MUST be application/x-votable+xml'' with the content'' parameter set to datalink'', with the canonical form given in
\ref{sec:mime} strongly recommended. Otherwise, any legal VOTable
media type, including text/xml, is allowed.

That is: clients wishing to do dispatch based on the datalink media
type must indicate that they accept VOTable. It's a pretty safe bet
that major browsers won't do that (and potential future VO-enabled
browsers wouldn't need the XSLT, I'm sure). And although HTTP
content negotiation isn't as popular as it should be, I think it's
implementationally not very intrusive.

The only alternative I could come up with would be to codify what I'm
currently doing:

Unless the incoming request included a RESPONSEFORMAT parameter
requesting a different format, and unless the user agent indicates
it will accept text/html, the content-type header of the response
MUST be application/x-votable+xml'' with the content''
parameter set to ``datalink'', with the canonical form given in
\ref{sec:mime} strongly recommended.

We could then have a footnote explaining what the text/html exception
is supposed to do. The downside here is that it's really an ugly
hack to return text/xml when accept has text/html, and there's too
much library code that wantonly sticks text/html into accept behind
the programmers' backs.

I think given the media type hasn't seen too much use so far anyway
and when a client wants to use it, it would be new code anyway, I'd
go for option one.

But if anyone had a less painful idea, that'd be even better. Does
anyone?

@Bonnarel
Copy link
Contributor Author

Bonnarel commented Jan 9, 2023

Answer by Mark Taylor :

Given that the proposed text changes are rather convoluted,
and that nobody is, as far as I know, actually using the currently
mandated content-type behaviour (evidence: Markus has been violating
it and nobody beside validation nerds have complained/noticed),
another possibility would just be to downgrade that MUST to a SHOULD:

Unless the incoming request included a RESPONSEFORMAT parameter
requesting a different format, the content-type header of the
response SHOULD be application/x-votable+xml'' with the content''
parameter set to ``datalink'', with the canonical form given in
\ref{sec:mime} strongly recommended.

So: use the datalink content-type unless you've got a good reason not to
(as does Markus, and other service providers that use the same tricks
to render links tables in browsers, at least as long as browsers
won't apply XSLT to content-types marked with +xml).

This would technically be a breaking change from DL 1.0 to 1.1
(as would Markus's other proposed changes). But given the likely impact
in practice of the change (none?) I think we could turn a blind eye.

Mark

From Markus Demleitner today +1 there's one skeleton in the closet that recently came up again, and perhaps we can still somehow bury it before RFC.

The problem is the following text:

Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, the content-type header of the response MUST be application/x-votable+xml'' with the content'' parameter set to ``datalink'', with the canonical form given in \ref{sec:mime} strongly recommended.

The purpose of this language is that clients can (relatively) easily work out that they are dealing with a Datalink document regardless of where they get it from (as long as it's http). I think that's a good idea, although I'm not aware of a client that actually looks at content-type when retrieving things that could be datalink documents.

But at the same time this is blocking an important use case: Displaying datalink documents in the browser (Background: http://mail.ivoa.net/pipermail/dal/2021-April/008426.html and https://github.com/msdemlei/datalink-xslt). When I wrote the XSLT for that in ~2016, I planned it as a temporary hack until there are good datalink clients, but now I think letting people open datalinks with the browser and getting something actually usable is a major use case in itself.

The trouble with this: Web browsers will not apply the XSLT to documents with a media type of application/x-votable+xml;content=datalink. I have to give them text/xml to start the whole magic.

I hence at the moment have the choice of violating the standard or breaking a use case important to me. I weaseled around that first by inspecting user agent strings and only returning text/xml if the user agent looked as if I was dealing a web browser, praying nobody would notice. But that broke rather quickly (I forget the details), and I switched to inspecting the accept header. If I find a text/html in there, I return text/xml (yeah, it's that twisted), otherwise I'm compliant with the datalink spec.

But it's still a violation of the standard. I had hoped programmatic use would not be impacted, but it turns out that, for instance, the JVMs earlier than 11 actually indicate acceptance of text/html, too. Sigh.

So... it's trouble, and I have not found any solution that doesn't make me cringe. But I increasingly have the impression that ignoring the problem will only make matters worse.

The least horrible proposal I have would be to replace the text quoted above:

When a datalink service returns a datalink VOTable (i.e., absent a RESPONSEFORMAT parameter requesting something else), it MUST indicate that in the response's content-type header. When the request's accept header includes application/x-votable+xml'', then it MUST be application/x-votable+xml'' with the content'' parameter set to datalink'', with the canonical form given in \ref{sec:mime} strongly recommended. Otherwise, any legal VOTable media type, including text/xml, is allowed.

That is: clients wishing to do dispatch based on the datalink media type must indicate that they accept VOTable. It's a pretty safe bet that major browsers won't do that (and potential future VO-enabled browsers wouldn't need the XSLT, I'm sure). And although HTTP content negotiation isn't as popular as it should be, I think it's implementationally not very intrusive.

The only alternative I could come up with would be to codify what I'm currently doing:

Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, and unless the user agent indicates it will accept text/html, the content-type header of the response MUST be application/x-votable+xml'' with the content'' parameter set to ``datalink'', with the canonical form given in \ref{sec:mime} strongly recommended.

We could then have a footnote explaining what the text/html exception is supposed to do. The downside here is that it's really an ugly hack to return text/xml when accept has text/html, and there's too much library code that wantonly sticks text/html into accept behind the programmers' backs.

I think given the media type hasn't seen too much use so far anyway and when a client wants to use it, it would be new code anyway, I'd go for option one.

But if anyone had a less painful idea, that'd be even better. Does anyone?

@Bonnarel
Copy link
Contributor Author

Bonnarel commented Jan 9, 2023

Hum,
Thinking about it, I tend to agree with Markus option one because depends of VO tool developpers in that case.
Anyway I create a new github issue with this discussion

François

Answer by Mark Taylor :

Given that the proposed text changes are rather convoluted, and that nobody is, as far as I know, actually using the currently mandated content-type behaviour (evidence: Markus has been violating it and nobody beside validation nerds have complained/noticed), another possibility would just be to downgrade that MUST to a SHOULD:

Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, the content-type header of the response SHOULD be application/x-votable+xml'' with the content'' parameter set to ``datalink'', with the canonical form given in \ref{sec:mime} strongly recommended.

So: use the datalink content-type unless you've got a good reason not to (as does Markus, and other service providers that use the same tricks to render links tables in browsers, at least as long as browsers won't apply XSLT to content-types marked with +xml).

This would technically be a breaking change from DL 1.0 to 1.1 (as would Markus's other proposed changes). But given the likely impact in practice of the change (none?) I think we could turn a blind eye.

Mark

From Markus Demleitner today +1 there's one skeleton in the closet that recently came up again, and perhaps we can still somehow bury it before RFC.
The problem is the following text:
Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, the content-type header of the response MUST be application/x-votable+xml'' with the content'' parameter set to datalink'', with the canonical form given in \ref{sec:mime} strongly recommended. The purpose of this language is that clients can (relatively) easily work out that they are dealing with a Datalink document regardless of where they get it from (as long as it's http). I think that's a good idea, although I'm not aware of a client that actually looks at content-type when retrieving things that could be datalink documents. But at the same time this is blocking an important use case: Displaying datalink documents in the browser (Background: http://mail.ivoa.net/pipermail/dal/2021-April/008426.html and https://github.com/msdemlei/datalink-xslt). When I wrote the XSLT for that in ~2016, I planned it as a temporary hack until there are good datalink clients, but now I think letting people open datalinks with the browser and getting something actually usable is a major use case in itself. The trouble with this: Web browsers will not apply the XSLT to documents with a media type of application/x-votable+xml;content=datalink. I have to give them text/xml to start the whole magic. I hence at the moment have the choice of violating the standard or breaking a use case important to me. I weaseled around that first by inspecting user agent strings and only returning text/xml if the user agent looked as if I was dealing a web browser, praying nobody would notice. But that broke rather quickly (I forget the details), and I switched to inspecting the accept header. If I find a text/html in there, I return text/xml (yeah, it's that twisted), otherwise I'm compliant with the datalink spec. But it's still a violation of the standard. I had hoped programmatic use would not be impacted, but it turns out that, for instance, the JVMs earlier than 11 actually indicate acceptance of text/html, too. Sigh. So... it's trouble, and I have not found any solution that doesn't make me cringe. But I increasingly have the impression that ignoring the problem will only make matters worse. The least horrible proposal I have would be to replace the text quoted above: When a datalink service returns a datalink VOTable (i.e., absent a RESPONSEFORMAT parameter requesting something else), it MUST indicate that in the response's content-type header. When the request's accept header includes `application/x-votable+xml'', then it MUST be `application/x-votable+xml'' with the `content'' parameter set to `datalink'', with the canonical form given in \ref{sec:mime} strongly recommended. Otherwise, any legal VOTable media type, including text/xml, is allowed. That is: clients wishing to do dispatch based on the datalink media type must indicate that they accept VOTable. It's a pretty safe bet that major browsers won't do that (and potential future VO-enabled browsers wouldn't need the XSLT, I'm sure). And although HTTP content negotiation isn't as popular as it should be, I think it's implementationally not very intrusive. The only alternative I could come up with would be to codify what I'm currently doing: Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, and unless the user agent indicates it will accept text/html, the content-type header of the response MUST be `application/x-votable+xml'' with the `content'' parameter set to datalink'', with the canonical form given in \ref{sec:mime} strongly recommended.
We could then have a footnote explaining what the text/html exception is supposed to do. The downside here is that it's really an ugly hack to return text/xml when accept has text/html, and there's too much library code that wantonly sticks text/html into accept behind the programmers' backs.
I think given the media type hasn't seen too much use so far anyway and when a client wants to use it, it would be new code anyway, I'd go for option one.
But if anyone had a less painful idea, that'd be even better. Does anyone?

@pdowler
Copy link
Collaborator

pdowler commented Jan 9, 2023 via email

@msdemlei
Copy link
Collaborator

msdemlei commented Jan 10, 2023 via email

@mbtaylor
Copy link
Member

In practice, as author of a client that does have to figure out when something is a links table and when it's a catalogue or whatever, I generally do that by duck typing - if it's a VOTable with most or all of the columns required by DataLink sec 3.2 then I can treat it as a links document. I'm likely to carry on doing that in preference to looking at the content-type for practical reasons - content-types are not always present and correct, they can be fiddly to obtain and parse, and you might be acquiring one of these tables in some way other than HTTP.

When I would like some signal about whether something is or is not a links table is before I've acquired it, to give the user a hint about whether they will want to download it. In principle the HTTP content-type in conjunction with a HEAD request could help there, but really I want that information without needing any HTTP interaction.

So from my point of view some relaxation of constraints on the HTTP content-type header (like Pat's: any valid content-type is OK, or mine: use SHOULD not MUST) is not likely to present practical problems. Other resource consumers may have different views of course.

@pdowler
Copy link
Collaborator

pdowler commented Jan 10, 2023 via email

@msdemlei
Copy link
Collaborator

msdemlei commented Jan 11, 2023 via email

@Bonnarel
Copy link
Contributor Author

Bonnarel commented Jan 11, 2023 via email

@jd-au
Copy link
Member

jd-au commented Jan 17, 2023

The CASDA implementation is also non-compliant with Datalink 1.0 as it always has a content-type of text/xml to support XSLT rendering. Adding a standardID INFO would be fine for us.

However, §3.3 of Datalink 1.1 still has a MUST for the content-type header being application/x-votable+xml which precludes rendering with XSLT. Markus' option 1a seems like the best compromise here.

@pdowler
Copy link
Collaborator

pdowler commented Jan 31, 2023

I am preparing some editorial changes that include MUST -> SHOULD and allows any valid VOTable mime type.

I am open to making the standardID INFO mandatory in 1.1 so that clients have a clear way to detect that a links table is in there... that's somewhat better then the http header anyway since it gets saved in the xml file for later use. I will make that change as well and see how it looks at review.

@pdowler pdowler closed this as completed in e91898c Feb 7, 2023
pdowler added a commit that referenced this issue Feb 7, 2023
resolve #90 (ObsCore example) and resolve #91 (relax content-type)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants