Emit a warning when a standard format is redefined in `custom_format` #83

Stranger6667 · 2021-05-31T18:21:25Z

Following up our today's conversation. The idea is to emit a warning instead of an error here and allow the user to redefine standard formats

Zac-HD · 2021-06-02T12:49:26Z

Fixed in 23c3cea and released as version 0.20 😁

Stranger6667 · 2021-06-02T13:04:25Z

Thank you very much!

Stranger6667 · 2021-06-03T11:42:49Z

I am wondering whether it will be ok to relax the if not isinstance(string, str) check inside _get_format_filter? In Schemathesis I use the format keyword to generate data for the binary Open API format, which is kind of out-of JSON Schema scope as it implies bytes, but it looks like the simplest way to implement it. What do you think?

Zac-HD · 2021-06-03T12:10:08Z

Hmm. If we follow the links from the OpenAPI format - string | binary | any sequence of octets - back to the JSONSchema spec, it says string A string of Unicode code points, from the JSON "string" production; in turn this links back to JSON (RFC 8259) and JSON text exchanged between systems ... MUST be encoded using UTF-8.

Unfortunately this means that the OpenAPI spec - a string (unicode codepoints, in UTF-8 encoding) which is any sequence of octets (octets = an eight-bit binary byte) - is self-contradictory. I think the best response available is to:

for this purpose, treat "octet" as specifying "a Unicode codepoint in the range [0 .. 255]". In the latin-1 encoding this is equivalent to a true octet, and it's the closest we can get in UTF-8 too.
Report this to the OpenAPI Technical Steering Committee

Alternatively, if you really need non-str instances you could monkeypatch _get_format_filter. I wouldn't want that in hypothesis-jsonschema since it's really not a jsonschema thing, but the option is there if you need it. (and with exec(getsource(f).replace(...)) it's not even that ~~eval~~ evil)

Stranger6667 · 2021-06-03T12:38:09Z

Hmm. If we follow the links from the OpenAPI format - string | binary | any sequence of octets - back to the JSONSchema spec, it says string A string of Unicode code points, from the JSON "string" production; in turn this links back to JSON (RFC 8259) and JSON text exchanged between systems ... MUST be encoded using UTF-8.

It is mostly used to describe binary payloads like images or other non-text file formats, i.e. in non-JSON context. It was somehow addressed in Open API 3.1, where most formats are taken from JSON Schema Draft 2020-12, and there is no binary format anymore. Binary data MAY omit the schema (they have {} in examples to denote binary data). So, probably it doesn't require upstream reporting if I am not missing something here.

for this purpose, treat "octet" as specifying "a Unicode codepoint in the range [0 .. 255]". In the latin-1 encoding this is equivalent to a true octet, and it's the closest we can get in UTF-8 too.

Yeah, it seems like the way to go in the long run. Having the binary format as it is currently implemented in Schemathesis will lead to JSON encoding errors in more complex schemas than {"type": "string", "format": "binary"} (e.g., nested in some object property).

Alternatively, if you really need non-str instances you could monkeypatch _get_format_filter. I wouldn't want that in hypothesis-jsonschema since it's really not a jsonschema thing, but the option is there if you need it. (and with exec(getsource(f).replace(...)) it's not even that eval evil)

Agree. I think it could be an intermediate solution for Schemathesis to keep the status quo until it is resolved more appropriately :)

Thank you!

Zac-HD closed this as completed Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit a warning when a standard format is redefined in `custom_format` #83

Emit a warning when a standard format is redefined in `custom_format` #83

Stranger6667 commented May 31, 2021

Zac-HD commented Jun 2, 2021

Stranger6667 commented Jun 2, 2021

Stranger6667 commented Jun 3, 2021

Zac-HD commented Jun 3, 2021

Stranger6667 commented Jun 3, 2021

Emit a warning when a standard format is redefined in custom_format #83

Emit a warning when a standard format is redefined in custom_format #83

Comments

Stranger6667 commented May 31, 2021

Zac-HD commented Jun 2, 2021

Stranger6667 commented Jun 2, 2021

Stranger6667 commented Jun 3, 2021

Zac-HD commented Jun 3, 2021

Stranger6667 commented Jun 3, 2021

Emit a warning when a standard format is redefined in `custom_format` #83

Emit a warning when a standard format is redefined in `custom_format` #83