Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema ought to work with more than one implementation of JSON Schema #147

Closed
mvahowe opened this issue Feb 27, 2020 · 15 comments
Closed
Labels
Defining Feature or Fix We have a rough idea that needs refining before implementation Talk About This! Consider putting this issue on the agenda for an SB meeting

Comments

@mvahowe
Copy link
Contributor

mvahowe commented Feb 27, 2020

Right now, there are two validation scripts in SB, for JS and Python respectively. The schema passes according to the JS one. The huge and utterly uninformative error from the Python implementation is pasted at the end of this post.

This kind of issue was entirely predictable (and indeed was predicted before we moved to JSON), but this doesn't help to define a way forward.

One thing we should do is find out just how variable JSON Schema behaviour is across languages we care about. That would include

I strongly suspect that I've hit an edge case in JSON Schema (I have an unerring ability to do this with languages in general). If we can find the edge case we can probably steer away from it and restore similar behaviour across languages.

Also, if we can define the edge case(s) well we can submit a bug report or even a fix to the Python JSON Schema project.

Another option is to say (preferably with a straight face) that we recommend using the JS implementation everywhere, if necessary via a shell script invocation.

If someone with a big heart and no foresight about hosting costs was to offer SB validation as an API, that would help too.

mark@jsexp:~/scripture-burrito/code$ ./validate.py ../docs/examples/artifacts/textTranslation.json
../docs/examples/artifacts/textTranslation.json: {'meta': {'version': '0.2.0', 'variant': 'default', 'dateCreated': '2019-02-19T01:02:03+01:00', 'generator': {'softwareName': 'Burrito Factory', 'softwareVersion': '0.1', 'userName': 'Jane Doe'}, 'uploader': {'softwareName': 'Burrito Truck', 'softwareVersion': '0.1', 'userId': 'dbl::5678', 'userName': 'Josh Buck'}, 'defaultLanguage': 'en', 'comments': ['Experimenting with i18n', 'Fixed canon before upload. ~Josh']}, 'idServers': {'dbl': {'id': 'https://thedigitalbiblelibrary.org', 'name': {'en': 'The Digital Bible Library'}}, 'agmt': {'id': 'http://registry.autographamt.com', 'name': {'en': 'Autographa'}}, 'x-atl': {'id': 'http://atlantisbibleconsortium.net'}}, 'identification': {'systemId': {'dbl': {'id': '0123456789abcdef', 'revision': '23'}, 'gbc': {'id': '55df02965117ad3f2201309b'}, 'paratext': {'id': '2d5220a02a7aaac6bcc2831ae262e9aaca5e1abd'}}, 'idServer': 'dbl', 'name': {'en': 'Scripture Burrito Demo Text Bible', 'fr': 'Crêpe mexicaine biblique surdimensionnée (démonstration)'}, 'description': {'en': 'A Demonstration Scripture Burrito containing Text, like Paratext Might One Day Produce'}, 'abbreviation': {'en': 'DSB', 'fr': 'CMBS'}}, 'confidentiality': {'metadata': 'unrestricted', 'source': 'private', 'publications': 'restricted'}, 'type': {'flavorType': {'name': 'scripture', 'currentScope': {'GEN': [], 'EXO': ['1', '3-12', '13:4', '14:3-8', '15:8-16:2'], 'LEV': ['2-3'], 'MAT': ['1', '5', '7-11']}, 'canonType': ['ot', 'nt'], 'canonSpec': {'ot': {'name': 'western'}, 'nt': {'name': 'x-matthewOnlyMillenialists', 'books': ['MAT']}}, 'flavor': {'name': 'textTranslation', 'projectType': 'standard', 'audience': 'common', 'translationType': 'newTranslation', 'usfmVersion': '3.1.rc49'}}}, 'relationships': [{'relationType': 'expression', 'flavor': 'scripturePrint', 'id': 'dbl::fedcba9876543210:2'}, {'relationType': 'expression', 'flavor': 'glossedTextStory', 'id': 'x-atl::gl47'}, {'relationType': 'parascriptural', 'flavor': 'parascripturalWordAlignment', 'id': 'agmt::irvmal-4-wh'}], 'languages': [{'tag': 'en', 'name': {'en': 'English', 'de': 'Englisch', 'fr': 'anglais'}, 'numberingSystem': 'latn'}], 'countries': [{'code': 'NL', 'name': {'nl': 'Nederland', 'kl': 'Pukkitsormiut', 'la': 'Batavia', 'ru': 'Нидерланды'}}], 'agencies': [{'id': 'dbl::23', 'name': {'en': 'Burritos R Us Inc'}, 'abbr': {'en': 'BRU'}, 'url': 'https://burritos-r-us.org', 'roles': ['rightsHolder', 'content', 'finance', 'management', 'publication', 'qa']}, {'id': 'dbl::29', 'name': {'en': 'We Manage Burritos'}, 'roles': ['qa']}], 'copyright': {'rightsHolderAgencies': [0, 1], 'rightsAdminAgency': 1, 'licenses': [{'url': 'https://burritos-r-us.org/licenses/3247'}], 'shortStatementPlain': {'fr': '© Burritos R Us 2019.'}, 'fullStatementPlain': {'fr': '© Burritos R Us 2019. Tous droits réservés.'}, 'fullStatementRich': {'fr': '<p><b>© Burritos R Us 2019.</b></p><p><i>Tous droits réservés.</i></p>'}}, 'ingredients': {'source/usfm/OTINT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intot', 'size': 1234, 'isSource': True}, 'source/usfm/GEN.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'GEN': []}, 'size': 1234, 'isSource': True}, 'source/usfm/EXO.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'EXO': ['1-12']}, 'size': 1234, 'isSource': True}, 'source/usfm/LEV.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'LEV': ['2:3-3:7']}, 'size': 1234, 'isSource': True}, 'source/usfm/INTNT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intnt', 'size': 1234, 'isSource': True}, 'source/usfm/INTMAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intMAT', 'size': 1234, 'isSource': True}, 'source/usfm/MAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'MAT': ['1:3', '1:5', '1:7-11']}, 'size': 1234, 'isSource': True}, 'release/text/USX_1/OTINT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intot', 'size': 1234}, 'release/text/USX_1/GEN.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'GEN': []}, 'size': 1234}, 'release/text/USX_1/EXO.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'EXO': ['1-12']}, 'size': 1234}, 'release/text/USX_1/LEV.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'LEV': ['2:3-3:7']}, 'size': 1234}, 'release/text/USX_1/INTNT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intnt', 'size': 1234}, 'release/text/USX_1/INTMAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intMAT', 'size': 1234}, 'release/text/USX_1/MAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'MAT': ['1:3', '1:5', '1:7-11']}, 'size': 1234}, 'unknownAdditive.foo': {'mimeType': 'application/octet-stream', 'size': 99}}, 'names': {'book-gen': {'abbr': {'fr': 'Gn'}, 'short': {'fr': 'Genèse'}, 'long': {'fr': 'La Genèse'}}, 'book-exo': {'abbr': {'fr': 'Ex'}, 'short': {'fr': 'Exode'}, 'long': {'fr': 'L’Exode'}}, 'book-lev': {'abbr': {'fr': 'Lv'}, 'short': {'fr': 'Lévitique'}, 'long': {'fr': 'Le Lévitique'}}, 'book-mat': {'abbr': {'fr': 'Mt'}, 'short': {'en': 'Matthew', 'fr': 'Matthieu'}, 'long': {'fr': 'Evangile selon Matthieu'}}, 'frontmatter': {'short': {'fr': 'Avant de lire Matthieu ...'}}, 'intnt': {'short': {'fr': 'A propos du Nouveau Testament'}}, 'intmat': {'short': {'fr': 'A propos de Matthieu'}}}, 'progress': {'dateStarted': '2017-11-30', 'dateCompleted': '2017-12-01'}} is not valid under any of the given schemas

Failed validating 'oneOf' in schema:
    {'$id': 'https://burrito.bible/schema/metadata.schema.json',
     '$schema': 'http://json-schema.org/draft-07/schema',
     'description': 'Scripture Burrito root metadata object.',
     'oneOf': [{'$ref': 'default_metadata.schema.json'},
               {'$ref': 'derived_metadata.schema.json'}],
     'title': 'Scripture Burrito Metadata',
     'type': 'object'}

On instance:
    {'agencies': [{'abbr': {'en': 'BRU'},
                   'id': 'dbl::23',
                   'name': {'en': 'Burritos R Us Inc'},
                   'roles': ['rightsHolder',
                             'content',
                             'finance',
                             'management',
                             'publication',
                             'qa'],
                   'url': 'https://burritos-r-us.org'},
                  {'id': 'dbl::29',
                   'name': {'en': 'We Manage Burritos'},
                   'roles': ['qa']}],
     'confidentiality': {'metadata': 'unrestricted',
                         'publications': 'restricted',
                         'source': 'private'},
     'copyright': {'fullStatementPlain': {'fr': '© Burritos R Us 2019. '
                                                'Tous droits réservés.'},
                   'fullStatementRich': {'fr': '<p><b>© Burritos R Us '
                                               '2019.</b></p><p><i>Tous '
                                               'droits réservés.</i></p>'},
                   'licenses': [{'url': 'https://burritos-r-us.org/licenses/3247'}],
                   'rightsAdminAgency': 1,
                   'rightsHolderAgencies': [0, 1],
                   'shortStatementPlain': {'fr': '© Burritos R Us 2019.'}},
     'countries': [{'code': 'NL',
                    'name': {'kl': 'Pukkitsormiut',
                             'la': 'Batavia',
                             'nl': 'Nederland',
                             'ru': 'Нидерланды'}}],
     'idServers': {'agmt': {'id': 'http://registry.autographamt.com',
                            'name': {'en': 'Autographa'}},
                   'dbl': {'id': 'https://thedigitalbiblelibrary.org',
                           'name': {'en': 'The Digital Bible Library'}},
                   'x-atl': {'id': 'http://atlantisbibleconsortium.net'}},
     'identification': {'abbreviation': {'en': 'DSB', 'fr': 'CMBS'},
                        'description': {'en': 'A Demonstration Scripture '
                                              'Burrito containing Text, '
                                              'like Paratext Might One Day '
                                              'Produce'},
                        'idServer': 'dbl',
                        'name': {'en': 'Scripture Burrito Demo Text Bible',
                                 'fr': 'Crêpe mexicaine biblique '
                                       'surdimensionnée (démonstration)'},
                        'systemId': {'dbl': {'id': '0123456789abcdef',
                                             'revision': '23'},
                                     'gbc': {'id': '55df02965117ad3f2201309b'},
                                     'paratext': {'id': '2d5220a02a7aaac6bcc2831ae262e9aaca5e1abd'}}},
     'ingredients': {'release/text/USX_1/EXO.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'EXO': ['1-12']},
                                                    'size': 1234},
                     'release/text/USX_1/GEN.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'GEN': []},
                                                    'size': 1234},
                     'release/text/USX_1/INTMAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                       'mimeType': 'text/x-usx+xml',
                                                       'role': 'intMAT',
                                                       'size': 1234},
                     'release/text/USX_1/INTNT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                      'mimeType': 'text/x-usx+xml',
                                                      'role': 'intnt',
                                                      'size': 1234},
                     'release/text/USX_1/LEV.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'LEV': ['2:3-3:7']},
                                                    'size': 1234},
                     'release/text/USX_1/MAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'MAT': ['1:3',
                                                                      '1:5',
                                                                      '1:7-11']},
                                                    'size': 1234},
                     'release/text/USX_1/OTINT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                      'mimeType': 'text/x-usx+xml',
                                                      'role': 'intot',
                                                      'size': 1234},
                     'source/usfm/EXO.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'EXO': ['1-12']},
                                             'size': 1234},
                     'source/usfm/GEN.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'GEN': []},
                                             'size': 1234},
                     'source/usfm/INTMAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                'isSource': True,
                                                'mimeType': 'text/x-sfm',
                                                'role': 'intMAT',
                                                'size': 1234},
                     'source/usfm/INTNT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                               'isSource': True,
                                               'mimeType': 'text/x-sfm',
                                               'role': 'intnt',
                                               'size': 1234},
                     'source/usfm/LEV.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'LEV': ['2:3-3:7']},
                                             'size': 1234},
                     'source/usfm/MAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'MAT': ['1:3',
                                                               '1:5',
                                                               '1:7-11']},
                                             'size': 1234},
                     'source/usfm/OTINT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                               'isSource': True,
                                               'mimeType': 'text/x-sfm',
                                               'role': 'intot',
                                               'size': 1234},
                     'unknownAdditive.foo': {'mimeType': 'application/octet-stream',
                                             'size': 99}},
     'languages': [{'name': {'de': 'Englisch',
                             'en': 'English',
                             'fr': 'anglais'},
                    'numberingSystem': 'latn',
                    'tag': 'en'}],
     'meta': {'comments': ['Experimenting with i18n',
                           'Fixed canon before upload. ~Josh'],
              'dateCreated': '2019-02-19T01:02:03+01:00',
              'defaultLanguage': 'en',
              'generator': {'softwareName': 'Burrito Factory',
                            'softwareVersion': '0.1',
                            'userName': 'Jane Doe'},
              'uploader': {'softwareName': 'Burrito Truck',
                           'softwareVersion': '0.1',
                           'userId': 'dbl::5678',
                           'userName': 'Josh Buck'},
              'variant': 'default',
              'version': '0.2.0'},
     'names': {'book-exo': {'abbr': {'fr': 'Ex'},
                            'long': {'fr': 'L’Exode'},
                            'short': {'fr': 'Exode'}},
               'book-gen': {'abbr': {'fr': 'Gn'},
                            'long': {'fr': 'La Genèse'},
                            'short': {'fr': 'Genèse'}},
               'book-lev': {'abbr': {'fr': 'Lv'},
                            'long': {'fr': 'Le Lévitique'},
                            'short': {'fr': 'Lévitique'}},
               'book-mat': {'abbr': {'fr': 'Mt'},
                            'long': {'fr': 'Evangile selon Matthieu'},
                            'short': {'en': 'Matthew', 'fr': 'Matthieu'}},
               'frontmatter': {'short': {'fr': 'Avant de lire Matthieu '
                                               '...'}},
               'intmat': {'short': {'fr': 'A propos de Matthieu'}},
               'intnt': {'short': {'fr': 'A propos du Nouveau Testament'}}},
     'progress': {'dateCompleted': '2017-12-01',
                  'dateStarted': '2017-11-30'},
     'relationships': [{'flavor': 'scripturePrint',
                        'id': 'dbl::fedcba9876543210:2',
                        'relationType': 'expression'},
                       {'flavor': 'glossedTextStory',
                        'id': 'x-atl::gl47',
                        'relationType': 'expression'},
                       {'flavor': 'parascripturalWordAlignment',
                        'id': 'agmt::irvmal-4-wh',
                        'relationType': 'parascriptural'}],
     'type': {'flavorType': {'canonSpec': {'nt': {'books': ['MAT'],
                                                  'name': 'x-matthewOnlyMillenialists'},
                                           'ot': {'name': 'western'}},
                             'canonType': ['ot', 'nt'],
                             'currentScope': {'EXO': ['1',
                                                      '3-12',
                                                      '13:4',
                                                      '14:3-8',
                                                      '15:8-16:2'],
                                              'GEN': [],
                                              'LEV': ['2-3'],
                                              'MAT': ['1', '5', '7-11']},
                             'flavor': {'audience': 'common',
                                        'name': 'textTranslation',
                                        'projectType': 'standard',
                                        'translationType': 'newTranslation',
                                        'usfmVersion': '3.1.rc49'},
                             'name': 'scripture'}}}
@mvahowe mvahowe added Defining Feature or Fix We have a rough idea that needs refining before implementation Talk About This! Consider putting this issue on the agenda for an SB meeting labels Feb 27, 2020
@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

One possibility is skew between Python and JS regexes.

@rdb
Copy link
Collaborator

rdb commented Feb 27, 2020

You got a "oneOf" error on the root object that disambiguates between the two derived variants. This is precisely the reason why I advocated against this approach. If there is any error in either schema, what you get is a validation error on the topmost invalid condition, which makes debugging impossible.

To see what the error actually is, I'd manually validate it against either the default or the derived metadata schema directly.

If we want to continue on with the approach, I'd change the validation script to manually validate it against either default or derived schemas to get a better error message.

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

Right but

  1. The JS implementation does provide helpful(ish) information on this kind of error
  2. There are many other places where we use oneOf.
  3. I'm still not convinced that either cascading conditionals or implementation-specific procedural code to pick between schema is going to scale to what we eventually need. I'm about to open an issue about templates which we need to support inside the main schema.

@rdb
Copy link
Collaborator

rdb commented Feb 27, 2020

Which branch are the failing schema and example file on? I'm happy to take a look. The develop branch passes, both on CI and on my own computer.

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

For the record, I just added

"peach": "melba"

under idServers, and this is the error I get from the JS validator. It's actually better than any error trace I've seen from any free implementation of RelaxNG.

../docs/examples/artifacts/textTranslation.json: [
  {
    keyword: 'pattern',
    dataPath: ".idServers['agmt'].name",
    schemaPath: '#/definitions/languageTag/pattern',
    params: { pattern: '^[A-Za-z]{2,3}([\\-_][A-Za-z0-9]+){0,4}$' },
    message: 'should match pattern "^[A-Za-z]{2,3}([\\-_][A-Za-z0-9]+){0,4}$"',
    propertyName: 'peach'
  },
  {
    keyword: 'propertyNames',
    dataPath: ".idServers['agmt'].name",
    schemaPath: '#/propertyNames',
    params: { propertyName: 'peach' },
    message: "property name 'peach' is invalid"
  },
  {
    keyword: 'additionalProperties',
    dataPath: '',
    schemaPath: '#/additionalProperties',
    params: { additionalProperty: 'progress' },
    message: 'should NOT have additional properties'
  },
  {
    keyword: 'oneOf',
    dataPath: '',
    schemaPath: '#/oneOf',
    params: { passingSchemas: null },
    message: 'should match exactly one schema in oneOf'
  }
]

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

@rdb This is develop. Using jsonschema directly gets me the same result. I'm using whatever comes with Ubuntu 19.10, which appears to have Python 2 as a dependency. (There's no version of jsonschema to be found anywhere, obviously.)

@rdb

This comment has been minimized.

@mvahowe

This comment has been minimized.

@mvahowe

This comment has been minimized.

@rdb

This comment has been minimized.

@mvahowe

This comment has been minimized.

@rdb
Copy link
Collaborator

rdb commented Feb 27, 2020

I reproduced the error using docker and the Ubuntu python3-jsonschema package. It turns out that Ubuntu ships an outdated version of jsonschema, which does not support draft-7 of the JSON Schema spec.

We require at least jsonschema 3.0.0, which can be installed using pip. We should document this.

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

@rdb also points out that the outdated package doesn't complain about being given an unsupported schema version, it just tries to wing it and fails.

@mvahowe
Copy link
Contributor Author

mvahowe commented Feb 27, 2020

@jag3773 @FoolRunning the information at https://json-schema.org/implementations.html looks useful and encouraging (ie there are allegedly implementations for draft 7 which is what we need).

@jag3773
Copy link
Collaborator

jag3773 commented Feb 27, 2020

Follow ups in #150 and #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Defining Feature or Fix We have a rough idea that needs refining before implementation Talk About This! Consider putting this issue on the agenda for an SB meeting
Projects
None yet
Development

No branches or pull requests

3 participants