-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handwritten documents and ALTO encoding - how to make ALTO more suitable for such documents - ideas #81
Comments
I have asked some people from Transkribus why they choose PAGE instead of ALTO, and what ALTO is missing to be a better format for handwritten comunity, and here is the answer: "As far as I remember, we chose PAGE as
From here I see one topic we may think on future (since some of the features missing at one point in time are already added, like polyline baseline, polygonal shape on all levels, etc.):
|
Recording different writers can be done with Tags? |
When working with Transkribus-SWT to generate GT my colleagues and I found ourselves several times running into trouble because we forgot to synchronize text line and word contents. The major advantage (IMHO) for ALTO compared to PAGE is the singular store point for OCR content, especially when one aims to create GT at least on word-level, as we do. |
Handwritten documents are more and more present into current projects and even ALTO can be used today to define a page layout and text information for this type of materials, I think there is still place for improvement. One recent change was related to baseline definition, that was changed from a float value (y coordinate of the line) to PointsType, since for handwritten text the baseline is not a straight line. Probably there are much more issues related to this topic that we can discuss and improve.
This topic is intended to be a place for collecting ideas for further discussions, from here we will collect most important topics and create individual issues
The text was updated successfully, but these errors were encountered: