Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify mediaType and artifactType syntax #1182

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sudo-bmitch
Copy link
Contributor

There was some confusion on whether parameters could be included in the mediaType or artifactType fields. This makes the following changes:

  • Consolidates the definition to a single section of the media-type.md file.
  • States that parameters are not permitted.
  • Includes a regular expression that can be used to validate a string.
  • Fixes a regexp bug in our validator that included the range &-^, which inadvertently included a long list of special characters that are not valid in a media type.
  • Adds tests for media types with parameters.

Comment on lines +24 to +30
Media types values MUST have a top-level type and subtype name, separated by a `/`, without any parameters.

The following regular expression may be used to validate media types:

```regexp
^[A-Za-z0-9][A-Za-z0-9!#$&^_.+-]{0,126}/[A-Za-z0-9][A-Za-z0-9!#$&^_.+-]{0,126}$
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a common section seems good (instead of repeating it over and over again), but I'm not convinced we have a good reason to limit further than what the RFC specifies? To put that another way, what do we gain from adding extra restrictions on top of the RFC itself?

We've previously discussed (and don't explicitly say anything about) case sensitivity too, as another example: https://oci.dag.dev/?image=tianon/test:sPoNgEbOb

I don't think this is great, but the spec has historically pointed to the RFC as-is. I think it's totally fair for us to say our media types/spec-specified objects shouldn't have parameters, but something about telling other artifact authors they also cannot do so feels off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were pointing directly to section 4.2, so there's a case to be made by some implementations that we had already excluded parameters that were defined in 4.3 (ECR rejects these manifests today). And schema/defs-descriptor.json almost defined this as a restriction if there wasn't a bug in the regexp.

A big fear for allowing this for fields like artifactType includes the referrers query with an artifactType filter. Should that allow parameters in the query, or should all responses with any parameter be returned if no parameter is included? That part of the spec gets a bit undefined without this limit.

@tianon
Copy link
Member

tianon commented Jun 21, 2024

I was talking to my coworker @LaurentGoderre about this today, and he pointed me at https://developer.mozilla.org/en-US/docs/Web/Media/Formats/codecs_parameter and https://github.com/jsdom/whatwg-mimetype which have some of that "why is this a thing?" flavor I was suggesting we need to look for in one of the weekly OCI calls a while back.

I don't think it's a perfect slam dunk "yes, we need to support these" but also sheds even more doubt (IMO) on whether we can reasonably apply a blanket "thou shalt not" on them either, so I'm personally leaning towards a NACK here.

@sudo-bmitch
Copy link
Contributor Author

sudo-bmitch commented Jun 21, 2024

In the scenario of an artifact type set to text/html;charset=utf-8", either in the manifest, or on the query string for the referrers API, should that be allowed, should it be an exact match, or should the server ignore the parameter when performing a match for the referrers listing?

Thinking of the risks of changing our mind in the future, allowing it today and denying it in the future would mean existing content on registries is no longer valid. But the reverse, denying it today and later deciding to allow it, would mean older registries and clients would reject new content. Neither is great, but the latter is at least feasible to me (I'd reject a change that invalidates existing valid data without a good reason).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants