You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting from #62 we figure out that would make sense to have a generic way for more in detail validation of ALTO files content. Several ideas were discussed on board meetings:
Use xsd 1.1 and asserts in order to implement some consistency checks like a textblock box to be fully included into page box (no negative coordinates and no coordinates bigger than page width/height). There are two main concerns in this case: there are not too many open source validation tools for 1.1 compared with 1.0 and second, if we add this into xsd validation the level of restriction would be too high and will became mandatory, creating a lot of troubles both on ALTO creators and consummers.
Use a separate SCHEMATRON schema (https://en.wikipedia.org/wiki/Schematron) as an add-on to default xsd validation. This new schema can be used optionally into a validation pipeline for ALTO files for users that would like to have more restrictive checks (more into the area of quality checks, rather than structural checks)
Based on board discussions, we should continue with option 2
On this topic we would like to collect as many ideas as possible for SCHEMATRON validation in order to create a list of checks to be implemented. For each proposed test, also specify a proposal for severity level. For the moment I would propose ERROR, WARNING and INFO as possible levels, just as starting point
Currently following tests/categories of tests were proposed:
Overlapping checks - even is not mandatory to have in ALTO zero overlaps, overlapping might indicate some issues
Parent elements without children (for example Texline without any String inside)
Any strings encodding issues
Meaningfull usage of optional information - for example, even VPOS, HPOS are optional in schema, might be a good idea to outline if any of these are missing, even as errors or at least warnings
Language specific checks (for example in Chinese usually each glyph should be encoded in fact as an word and two Chinese Glyphs into same word is considered incorect by some ALTO processors)
Please add your own ideas, detail test categories listed above so that we can create in the final a list of tests to be implemented and their verbosity level. SCHEMATRON schema would be optional, but should be a sort of guideline of good practices when creating ALTO files
The text was updated successfully, but these errors were encountered:
Starting from #62 we figure out that would make sense to have a generic way for more in detail validation of ALTO files content. Several ideas were discussed on board meetings:
Based on board discussions, we should continue with option 2
On this topic we would like to collect as many ideas as possible for SCHEMATRON validation in order to create a list of checks to be implemented. For each proposed test, also specify a proposal for severity level. For the moment I would propose ERROR, WARNING and INFO as possible levels, just as starting point
Currently following tests/categories of tests were proposed:
Please add your own ideas, detail test categories listed above so that we can create in the final a list of tests to be implemented and their verbosity level. SCHEMATRON schema would be optional, but should be a sort of guideline of good practices when creating ALTO files
The text was updated successfully, but these errors were encountered: