Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit a warning when a standard format is redefined in custom_format #83

Closed
Stranger6667 opened this issue May 31, 2021 · 5 comments
Closed

Comments

@Stranger6667
Copy link
Contributor

Following up our today's conversation. The idea is to emit a warning instead of an error here and allow the user to redefine standard formats

@Zac-HD
Copy link
Member

Zac-HD commented Jun 2, 2021

Fixed in 23c3cea and released as version 0.20 😁

@Zac-HD Zac-HD closed this as completed Jun 2, 2021
@Stranger6667
Copy link
Contributor Author

Thank you very much!

@Stranger6667
Copy link
Contributor Author

I am wondering whether it will be ok to relax the if not isinstance(string, str) check inside _get_format_filter? In Schemathesis I use the format keyword to generate data for the binary Open API format, which is kind of out-of JSON Schema scope as it implies bytes, but it looks like the simplest way to implement it. What do you think?

@Zac-HD
Copy link
Member

Zac-HD commented Jun 3, 2021

Hmm. If we follow the links from the OpenAPI format - string | binary | any sequence of octets - back to the JSONSchema spec, it says string A string of Unicode code points, from the JSON "string" production; in turn this links back to JSON (RFC 8259) and JSON text exchanged between systems ... MUST be encoded using UTF-8.

Unfortunately this means that the OpenAPI spec - a string (unicode codepoints, in UTF-8 encoding) which is any sequence of octets (octets = an eight-bit binary byte) - is self-contradictory. I think the best response available is to:

  1. for this purpose, treat "octet" as specifying "a Unicode codepoint in the range [0 .. 255]". In the latin-1 encoding this is equivalent to a true octet, and it's the closest we can get in UTF-8 too.
  2. Report this to the OpenAPI Technical Steering Committee

Alternatively, if you really need non-str instances you could monkeypatch _get_format_filter. I wouldn't want that in hypothesis-jsonschema since it's really not a jsonschema thing, but the option is there if you need it. (and with exec(getsource(f).replace(...)) it's not even that eval evil)

@Stranger6667
Copy link
Contributor Author

Hmm. If we follow the links from the OpenAPI format - string | binary | any sequence of octets - back to the JSONSchema spec, it says string A string of Unicode code points, from the JSON "string" production; in turn this links back to JSON (RFC 8259) and JSON text exchanged between systems ... MUST be encoded using UTF-8.

It is mostly used to describe binary payloads like images or other non-text file formats, i.e. in non-JSON context. It was somehow addressed in Open API 3.1, where most formats are taken from JSON Schema Draft 2020-12, and there is no binary format anymore. Binary data MAY omit the schema (they have {} in examples to denote binary data). So, probably it doesn't require upstream reporting if I am not missing something here.

for this purpose, treat "octet" as specifying "a Unicode codepoint in the range [0 .. 255]". In the latin-1 encoding this is equivalent to a true octet, and it's the closest we can get in UTF-8 too.

Yeah, it seems like the way to go in the long run. Having the binary format as it is currently implemented in Schemathesis will lead to JSON encoding errors in more complex schemas than {"type": "string", "format": "binary"} (e.g., nested in some object property).

Alternatively, if you really need non-str instances you could monkeypatch _get_format_filter. I wouldn't want that in hypothesis-jsonschema since it's really not a jsonschema thing, but the option is there if you need it. (and with exec(getsource(f).replace(...)) it's not even that eval evil)

Agree. I think it could be an intermediate solution for Schemathesis to keep the status quo until it is resolved more appropriately :)

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants