You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These String related attributes can be used to describe human based decisions/actions during the OCR text correction process:
• ILLS(boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used:
- during the production workflow (the control quality process needs to know if a specific word is part or not of the guaranteed text quality perimeter ; besides, this status informs that the provider made a manual task on the word)
- by the viewing software: end users should be informed that some words are illegible in the source document itself (it's not an OCR error...)
• DBTS(boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases.
• These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.
Remarks: ILLS could be useful on the TextBlock/TextLine types too:
areas of the page with physical defaults: stains, blur, etc.
areas of the page with scan defaults: curvature near the binding, missing blocks near the margins, etc.
These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).
<xsd:attributename="ILLS"type="xsd:boolean"use="optional">
<xsd:annotation >
<xsd:documentation>The word is illegible in the source document and can't be manually corrected. If the content owner thinks the word is legible, the attribute must be dropped (ILLS="false" is not recommended)< /xsd:documentation >
</xsd:annotation >
</xsd:attribute>
<xsd:attributename="DBTS"type="xsd:boolean"use="optional">
<xsd:annotation >
<xsd:documentation>The word has been manually corrected but a doubt remains. If the content owner thinks the doubt is not legimitate, the attribute must be dropped (DBTS="false" is not recommended).< /xsd:documentation >
</xsd:annotation >
</xsd:attribute>
The text was updated successfully, but these errors were encountered:
jpmoreux
changed the title
"Production family" attributes: CS, ILLS, DBTS
"OCR correction" attributes: CS, ILLS, DBTS
Jun 17, 2014
jpmoreux
changed the title
"OCR correction" attributes: CS, ILLS, DBTS
OCR correction attributes: CS, ILLS, DBTS
Jun 18, 2014
Use cases:
These String related attributes can be used to describe human based decisions/actions during the OCR text correction process:
• ILLS (boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used:
- during the production workflow (the control quality process needs to know if a specific word is part or not of the guaranteed text quality perimeter ; besides, this status informs that the provider made a manual task on the word)
- by the viewing software: end users should be informed that some words are illegible in the source document itself (it's not an OCR error...)
• DBTS (boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases.
• These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.
Remarks: ILLS could be useful on the TextBlock/TextLine types too:
These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).
Examples:
Schema change:
The text was updated successfully, but these errors were encountered: