Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR-D Workflow integration #5697

Closed
wants to merge 76 commits into from

Conversation

markusweigelt
Copy link
Collaborator

@markusweigelt markusweigelt commented Jun 13, 2023

  • Create OCR Profile entity with title and selected file of directory configured with property directory.ocr.profiles
  • Add new authorities to manage OCR Profile
  • Select OCR Profile in process template
  • Select individual OCR Profile in process -> default is the process template OCR Profile
  • Add OCR Profile as shell script variable via {ocrprofilefile} placeholder
  • Add project DMS export path as shell script variable via {projectdmsexportpath} placeholder
  • Added documentation among others for the placeholder https://github.com/kitodo/kitodo-production/wiki/Workflow-Editor#platzhalter

@bertsky
Copy link

bertsky commented Jun 27, 2023

In my opinion this pull request is a OCR-D specific integration and not a generic one as in my opinion every integration has its own needs: some are more complex some are more simple. OCR-D is needed code changes other did not require that.

Just because this PR is the first to integrate OCR workflow files does not mean it is not generic. Again, there is nothing specific to OCR-D here. The contents of the files is arbitrary as far as Kitodo is concerned, and the OCR is still integrated merely as a ScriptTask.

For example the Zeutschel ABBYY is a lot simpler to integrate and did not need any code change of Kitodo.Production and worked like that: calling a shell script which is starting a simple java application on the Kitodo.Production host is starting the OCR. The OCR by ABBYY is controlled by the Zeutschel Manager software and closing the executing task through the ActiveMQ message queue from Kitodo.Production.

Zeutschel/ABBYY decided they do not want to concern the users with details like selecting OCR workflows. That's understandable for a commercial service which is technically backed by a black box anyway. But they could have offered a workflow selector, too. Any OCR backend could – now.

@henning-gerhardt
Copy link
Collaborator

Okay, I'm out. Maybe I can not express my thinking or it is interpreted in the wrong way. Please remove me from the review. I'm out of this pull request.

@henning-gerhardt henning-gerhardt removed their request for review June 27, 2023 15:18
@Erikmitk
Copy link
Member

@markusweigelt Maybe you can explain where the OCR-D specificity comes from? I don't see it either. What would prevent someone from using whatever OCR they use now?

@markusweigelt
Copy link
Collaborator Author

I assumed that the workflows were OCR-D specific and that's why I had refactored it as part of the adjustements from the review. I think we all can agree on OCR Profile. I will then adjust that accordingly.

@solth solth added the feature label Jul 8, 2023
Copy link
Collaborator

@matthias-ronge matthias-ronge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR Technically, the result of the reviews is OK. But I wonder whether we need the functionality in this form.

What basically strikes me about this pull request is that a lot of effort is put into it for little additional functionality. Moreover, this can only be used with a special external application. At least if I understood it correctly. For me, it looks like this:

In my opinion, the ocrProfile is technical metadata. The goal is, that this metadata can be defined in the production template. I would choose the following solution, without touching the source code at all: The metadata is created with the desired value in the ruleset with the domain="technical". (An additional wrapping ruleset can be created for exactly this setting using includes, if different settings for the same ruleset are to be stored in the system.) The ruleset with the metadata is used in the production template. So, when a process is created, the ocrProfile is taken over into the process as the corresponding metadata. Here it can be used.
I see the directory for OCR profiles, that can be configured here, completely outside of KitodoProduction.

I think of a completely different (better) implementation: If you could edit a page of metadata in the production templates. So that you can start the metadata editor in the production template and set any metadata (allowed in the ruleset) there. This is then passed into all created processes.

That's my opinion on this. Technically, I haven't tried the pull request. The code looks OK when I looked at it.

@BartChris
Copy link
Collaborator

BartChris commented Jul 20, 2023

Hi all,

i have already tested out the OCR-D suite (https://github.com/slub/ocrd_manager, https://github.com/slub/ocrd_controller) in our Kitodo test system, which works good. And i am therefor trying to deploy that in production. I would appreciate if we could move forward in enabling a better integration of Kitodo and OCR-D. and want to make some comments here based on practical experience.

I got the OCR-D integration running without using the changes of this pull request and i want to point out that we also have to think about the indented users of the proposed implementation.

Right now i mostly think about two scenarios when doing an OCR in the context of out mass digitization

  1. Printed Documents with a simple layout (maybe from 1800 until today). For that maybe applying the default workflow provided in the OCR-D manger repository is already enough

https://github.com/slub/ocrd_manager/blob/main/workflows/ocr-workflow-default.sh

  1. More complex layouts which would require preprocessing and more steps from the OCR-D workflow:

https://ocr-d.de/en/workflows
I would probably apply the more complex workflows to older materials with more complex layouts.

Apart from that i could derive most of the information from the metadata we encode as part of the description in Kitodo, mainly the language (English, German..) and the Script type (Antiqua, Fraktur). I am already injecting those parameters on the fly in my predefined workflows.

The question for me then is: What is the additional value of tying a specific OCR-workflow ("OCR profile") to a production template? A production template could be associated with a project which holds many different type of materials which require different types of OCR. In general, a production template is determined by different considerations than the selection of a specific OCR.

This pull request make the "OCR profile" also selectable by process. This is more useful, but do the content editors really have the necessary knowledge to make an informed decision here? I do not see myself or another person with more knowledge configuring OCR profiles for single processes.

One possible scenario i can think of is that a user can - on the basis of the material's layout - define, that this material - although it is quiet young - requires a more sophisticated workflow. And thereby decides that the default simple workflow is overriden by a more complex, but also predefined workflow. This would mean that in the end i have two profiles: "Simple Layout" and "Complex layout". And this could probably be recorded, as Matthias suggests, in the technical metadata.

Having that information stored, it is possible to attach to those "Profile" specific file with workflow instructions and specify those files in Kitodo. But it is probably not ultimately necessary since my script which is called from Kitodo already contains a lot of Business logic and could get the workflow specification (encoded in a file) from a lot of different places.

I would probably agree that the approach here is generic enough since i could also use the information in the "OCR Profile" file to send specific instruction to any OCR server or process. I am not sure if it is really necessary, but nothing prevents the encoding of OCR server specific information (e.g. an Abby Server) in the file. Or one can choose not to select a profile at all because it is not necessary in the Institution's OCR setup. Kitodo stays agnostic in that regard. Although i do not see that many use cases for the given implementation, i just see it as support for an additional scenario of interacting with an OCR system. (by passing it workflows which are stored in Kitodo), which still allows for other, different ways of using an external OCR service.

PS: This is probably a limited view based on my practical experiences. I am sure there are a lot of scenarios where having a lot of profiles to select from is useful. But i am wondering if a sophisticated workflow specification will ever happen inside of Kitodo and not in another (e.g. OCR-D-based) software.

@Erikmitk
Copy link
Member

Erikmitk commented Aug 7, 2023

@BartChris Let me preface this by saying that everything depends on how you use Kitodo.Production and how (digitization) workflows are organized in each individual institution.

PS: This is probably a limited view based on my practical experiences. I am sure there are a lot of scenarios where having a lot of profiles to select from is useful. But i am wondering if a sophisticated workflow specification will ever happen inside of Kitodo and not in another (e.g. OCR-D-based) software.

If you know what kind of material you are digitizing than you could make an informed decision about which specific OCR-workflow you want to use as early as creating the project. So you can run every process through the tailored OCR workflow and it will work out just fine. Outliers can be dealt with individually. If you use whatever default workflow is available from the start, you then have to individually re-OCR each process with a different OCR-workflow if you'll have figured out later on that there's a better workflow available. This is completely independent from OCR-D, because you could use different engines or OCR models or whatever even with any kind of script task or processing as you choose.

But how do you make an informed decision about which OCR-Workflow to choose in the first place?!

This would mean that in the end i have two profiles: "Simple Layout" and "Complex layout". And this could probably be recorded, as Matthias suggests, in the technical metadata.

This view is too limited. You can have complex layouts, simple layouts, workflows for specific material from specific time frames and even handwriting. The main idea would be to share OCR profiles and give better default workflows for a wide variety of use cases. The choice we have (OCR-D specific) is very limited and will be expanded. Additionally, the idea would be to share workflows and models to gather experience in the community. Not every complex layout is the same and it can be useful to have a more specifically trained workflow at hand. Complex layouts in what kind of complexity? Mix of text and images? Very narrow tables? Overlapping Multi-Columns? These are not the same. If someone has found a useful workflow it could be shared publicly with examples etc. Not every institution has to experiment to find the perfect workflow for common material types or eras themselves. This can be a shared effort. So one could end up with a wide variety of possible workflows, parameters or models. That's what this OCR-profile selector is for. The choice to put this in the ruleset therefore does not seem viable, because that is to heavy on manual labor in maintaining these choices and keeping the actual workflows and options in sync. Just drop a new workflow-profile in the directory and it is there to be used.

In the context of OCR-D the main vision is to get rid of workflow choice completely. The project aims for auto-correcting and -optimizing workflows so that your only choice is to use it or not. But as we're not there yet, we need to have the option to choose an OCR-profile that can be used to select a corresponding OCR-workflow. And again, this could be OCR-D or any other script call that can be tuned by specifying a OCR profile that can be read and used accordingly.

Hope this makes sense!

@BartChris
Copy link
Collaborator

BartChris commented Aug 7, 2023

@Erikmitk I completely agree with what you are saying. I think as well that the exchange of standards and Best practices how to approach different types of materials would be one of the most important outcomes of the whole Kitodo-OCR-D project.
And i think you are also right in saying that there could be more profiles than just "easy" and "complex".

I would say that the most important thing in general is moving forward in integrating OCR-D or any other OCR technology with Kitodo so that we have a basis for knowledge sharing. With regard to this PR here i would like to stress, that if those knowledge sharing happens i do not really care wether those profiles are stored in Kitodo or somewhere on the server and be used by some external script.
I am therefor quite dispassionate about the specifics of this PR since it does not disallow any other usage of OCR-D or other OCR-related technologies.

@bertsky
Copy link

bertsky commented Aug 14, 2023

@BartChris thanks for your feedback!

I still see one point which @Erikmitk has not addressed:

Apart from that i could derive most of the information from the metadata we encode as part of the description in Kitodo, mainly the language (English, German..) and the Script type (Antiqua, Fraktur). I am already injecting those parameters on the fly in my predefined workflows.

Note: we are currently preparing to change our default OCR workflow(s) to include that metadata information (eg. Tesseract model switch block).

You can pass that info to the Manager from the Kitodo Script via parameters:

/usr/local/kitodo/scripts/script_ocr_process_dir.sh ... --workflow {ocrprofilefile} --lang $(meta.topstruct.DocLanguage) --script $(meta.topstruct.slub_script)

See database entries for default Kitodo workflow in the demo.

In a custom/external Kitodo instance, you need to configure your Kitodo workflows to include an OCR-D step. That wiki link also points to documentation on all the available placeholders.

@markusweigelt
Copy link
Collaborator Author

Closed cause new implementation without the use of an explicit entity with PR #5809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants