Skip to content

open .hocr files #730

@milahu

Description

@milahu

continue #729
part of #438

ideally i want to edit the hocr files like

gimagereader-qt6 001.hocr 001.jpg
gimagereader-qt6 002.jpg 002.hocr 003.jpg 003.hocr

this already works with .html files
but .hocr files are ignored

gimagereader-qt6 001.hocr.html
gimagereader-qt6 002.hocr.html 003.hocr.html

extra image files are counted as separate pages
but the page images referenced in the hocr files are used

<div class='ocr_page' id='page_1' title='image "001.tiff"; bbox ...'>

Would actually be trivial to also allow the .hocr file extension, but I'm not sure that's actually a standardized extension?

sounds like youre waiting for the central committee of file extensions
to allow this use case... ; )

see also kba/hocr-spec#115

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions