OCR-D Workflow integration #5697

markusweigelt · 2023-06-13T16:12:03Z

Create OCR Profile entity with title and selected file of directory configured with property directory.ocr.profiles
Add new authorities to manage OCR Profile
Select OCR Profile in process template
Select individual OCR Profile in process -> default is the process template OCR Profile
Add OCR Profile as shell script variable via {ocrprofilefile} placeholder
Add project DMS export path as shell script variable via {projectdmsexportpath} placeholder
Added documentation among others for the placeholder https://github.com/kitodo/kitodo-production/wiki/Workflow-Editor#platzhalter

…r workflow

bertsky · 2023-06-27T15:08:18Z

In my opinion this pull request is a OCR-D specific integration and not a generic one as in my opinion every integration has its own needs: some are more complex some are more simple. OCR-D is needed code changes other did not require that.

Just because this PR is the first to integrate OCR workflow files does not mean it is not generic. Again, there is nothing specific to OCR-D here. The contents of the files is arbitrary as far as Kitodo is concerned, and the OCR is still integrated merely as a ScriptTask.

For example the Zeutschel ABBYY is a lot simpler to integrate and did not need any code change of Kitodo.Production and worked like that: calling a shell script which is starting a simple java application on the Kitodo.Production host is starting the OCR. The OCR by ABBYY is controlled by the Zeutschel Manager software and closing the executing task through the ActiveMQ message queue from Kitodo.Production.

Zeutschel/ABBYY decided they do not want to concern the users with details like selecting OCR workflows. That's understandable for a commercial service which is technically backed by a black box anyway. But they could have offered a workflow selector, too. Any OCR backend could – now.

henning-gerhardt · 2023-06-27T15:16:11Z

Okay, I'm out. Maybe I can not express my thinking or it is interpreted in the wrong way. Please remove me from the review. I'm out of this pull request.

Erikmitk · 2023-06-27T15:20:43Z

@markusweigelt Maybe you can explain where the OCR-D specificity comes from? I don't see it either. What would prevent someone from using whatever OCR they use now?

markusweigelt · 2023-06-27T15:54:42Z

I assumed that the workflows were OCR-D specific and that's why I had refactored it as part of the adjustements from the review. I think we all can agree on OCR Profile. I will then adjust that accordingly.

matthias-ronge

TL;DR Technically, the result of the reviews is OK. But I wonder whether we need the functionality in this form.

What basically strikes me about this pull request is that a lot of effort is put into it for little additional functionality. Moreover, this can only be used with a special external application. At least if I understood it correctly. For me, it looks like this:

In my opinion, the ocrProfile is technical metadata. The goal is, that this metadata can be defined in the production template. I would choose the following solution, without touching the source code at all: The metadata is created with the desired value in the ruleset with the domain="technical". (An additional wrapping ruleset can be created for exactly this setting using includes, if different settings for the same ruleset are to be stored in the system.) The ruleset with the metadata is used in the production template. So, when a process is created, the ocrProfile is taken over into the process as the corresponding metadata. Here it can be used.
I see the directory for OCR profiles, that can be configured here, completely outside of KitodoProduction.

I think of a completely different (better) implementation: If you could edit a page of metadata in the production templates. So that you can start the metadata editor in the production template and set any metadata (allowed in the ruleset) there. This is then passed into all created processes.

That's my opinion on this. Technically, I haven't tried the pull request. The code looks OK when I looked at it.

BartChris · 2023-07-20T17:05:32Z

Hi all,

i have already tested out the OCR-D suite (https://github.com/slub/ocrd_manager, https://github.com/slub/ocrd_controller) in our Kitodo test system, which works good. And i am therefor trying to deploy that in production. I would appreciate if we could move forward in enabling a better integration of Kitodo and OCR-D. and want to make some comments here based on practical experience.

I got the OCR-D integration running without using the changes of this pull request and i want to point out that we also have to think about the indented users of the proposed implementation.

Right now i mostly think about two scenarios when doing an OCR in the context of out mass digitization

Printed Documents with a simple layout (maybe from 1800 until today). For that maybe applying the default workflow provided in the OCR-D manger repository is already enough

https://github.com/slub/ocrd_manager/blob/main/workflows/ocr-workflow-default.sh

More complex layouts which would require preprocessing and more steps from the OCR-D workflow:

https://ocr-d.de/en/workflows
I would probably apply the more complex workflows to older materials with more complex layouts.

Apart from that i could derive most of the information from the metadata we encode as part of the description in Kitodo, mainly the language (English, German..) and the Script type (Antiqua, Fraktur). I am already injecting those parameters on the fly in my predefined workflows.

The question for me then is: What is the additional value of tying a specific OCR-workflow ("OCR profile") to a production template? A production template could be associated with a project which holds many different type of materials which require different types of OCR. In general, a production template is determined by different considerations than the selection of a specific OCR.

This pull request make the "OCR profile" also selectable by process. This is more useful, but do the content editors really have the necessary knowledge to make an informed decision here? I do not see myself or another person with more knowledge configuring OCR profiles for single processes.

One possible scenario i can think of is that a user can - on the basis of the material's layout - define, that this material - although it is quiet young - requires a more sophisticated workflow. And thereby decides that the default simple workflow is overriden by a more complex, but also predefined workflow. This would mean that in the end i have two profiles: "Simple Layout" and "Complex layout". And this could probably be recorded, as Matthias suggests, in the technical metadata.

Having that information stored, it is possible to attach to those "Profile" specific file with workflow instructions and specify those files in Kitodo. But it is probably not ultimately necessary since my script which is called from Kitodo already contains a lot of Business logic and could get the workflow specification (encoded in a file) from a lot of different places.

I would probably agree that the approach here is generic enough since i could also use the information in the "OCR Profile" file to send specific instruction to any OCR server or process. I am not sure if it is really necessary, but nothing prevents the encoding of OCR server specific information (e.g. an Abby Server) in the file. Or one can choose not to select a profile at all because it is not necessary in the Institution's OCR setup. Kitodo stays agnostic in that regard. Although i do not see that many use cases for the given implementation, i just see it as support for an additional scenario of interacting with an OCR system. (by passing it workflows which are stored in Kitodo), which still allows for other, different ways of using an external OCR service.

PS: This is probably a limited view based on my practical experiences. I am sure there are a lot of scenarios where having a lot of profiles to select from is useful. But i am wondering if a sophisticated workflow specification will ever happen inside of Kitodo and not in another (e.g. OCR-D-based) software.

Erikmitk · 2023-08-07T14:20:59Z

@BartChris Let me preface this by saying that everything depends on how you use Kitodo.Production and how (digitization) workflows are organized in each individual institution.

PS: This is probably a limited view based on my practical experiences. I am sure there are a lot of scenarios where having a lot of profiles to select from is useful. But i am wondering if a sophisticated workflow specification will ever happen inside of Kitodo and not in another (e.g. OCR-D-based) software.

If you know what kind of material you are digitizing than you could make an informed decision about which specific OCR-workflow you want to use as early as creating the project. So you can run every process through the tailored OCR workflow and it will work out just fine. Outliers can be dealt with individually. If you use whatever default workflow is available from the start, you then have to individually re-OCR each process with a different OCR-workflow if you'll have figured out later on that there's a better workflow available. This is completely independent from OCR-D, because you could use different engines or OCR models or whatever even with any kind of script task or processing as you choose.

But how do you make an informed decision about which OCR-Workflow to choose in the first place?!

This would mean that in the end i have two profiles: "Simple Layout" and "Complex layout". And this could probably be recorded, as Matthias suggests, in the technical metadata.

This view is too limited. You can have complex layouts, simple layouts, workflows for specific material from specific time frames and even handwriting. The main idea would be to share OCR profiles and give better default workflows for a wide variety of use cases. The choice we have (OCR-D specific) is very limited and will be expanded. Additionally, the idea would be to share workflows and models to gather experience in the community. Not every complex layout is the same and it can be useful to have a more specifically trained workflow at hand. Complex layouts in what kind of complexity? Mix of text and images? Very narrow tables? Overlapping Multi-Columns? These are not the same. If someone has found a useful workflow it could be shared publicly with examples etc. Not every institution has to experiment to find the perfect workflow for common material types or eras themselves. This can be a shared effort. So one could end up with a wide variety of possible workflows, parameters or models. That's what this OCR-profile selector is for. The choice to put this in the ruleset therefore does not seem viable, because that is to heavy on manual labor in maintaining these choices and keeping the actual workflows and options in sync. Just drop a new workflow-profile in the directory and it is there to be used.

In the context of OCR-D the main vision is to get rid of workflow choice completely. The project aims for auto-correcting and -optimizing workflows so that your only choice is to use it or not. But as we're not there yet, we need to have the option to choose an OCR-profile that can be used to select a corresponding OCR-workflow. And again, this could be OCR-D or any other script call that can be tuned by specifying a OCR profile that can be read and used accordingly.

Hope this makes sense!

BartChris · 2023-08-07T14:30:06Z

@Erikmitk I completely agree with what you are saying. I think as well that the exchange of standards and Best practices how to approach different types of materials would be one of the most important outcomes of the whole Kitodo-OCR-D project.
And i think you are also right in saying that there could be more profiles than just "easy" and "complex".

I would say that the most important thing in general is moving forward in integrating OCR-D or any other OCR technology with Kitodo so that we have a basis for knowledge sharing. With regard to this PR here i would like to stress, that if those knowledge sharing happens i do not really care wether those profiles are stored in Kitodo or somewhere on the server and be used by some external script.
I am therefor quite dispassionate about the specifics of this PR since it does not disallow any other usage of OCR-D or other OCR-related technologies.

bertsky · 2023-08-14T16:38:41Z

@BartChris thanks for your feedback!

I still see one point which @Erikmitk has not addressed:

Apart from that i could derive most of the information from the metadata we encode as part of the description in Kitodo, mainly the language (English, German..) and the Script type (Antiqua, Fraktur). I am already injecting those parameters on the fly in my predefined workflows.

Note: we are currently preparing to change our default OCR workflow(s) to include that metadata information (eg. Tesseract model switch block).

You can pass that info to the Manager from the Kitodo Script via parameters:

/usr/local/kitodo/scripts/script_ocr_process_dir.sh ... --workflow {ocrprofilefile} --lang $(meta.topstruct.DocLanguage) --script $(meta.topstruct.slub_script)

See database entries for default Kitodo workflow in the demo.

In a custom/external Kitodo instance, you need to configure your Kitodo workflows to include an OCR-D step. That wiki link also points to documentation on all the available placeholders.

markusweigelt · 2023-10-11T12:24:14Z

Closed cause new implementation without the use of an explicit entity with PR #5809

markusweigelt added 30 commits June 27, 2022 10:23

add config parameter

3a5aaa6

initial commit of ocr workflow entity

4672dd1

add entities and db object for ocr workflow

8d6809a

add ocr workflow edit

54f0d93

add bean to hibernate

de02bb5

select ocr workflow on project level and fix delete and editing of oc…

c041fc3

…r workflow

create ocr workflow file in process dir

35637f0

add config parameter

b59203a

initial commit of ocr workflow entity

df3b6f5

add entities and db object for ocr workflow

6ae0344

add ocr workflow edit

805637a

add bean to hibernate

c737e71

select ocr workflow on project level and fix delete and editing of oc…

a227bbd

…r workflow

add ocr workflwo tab to process edit

922ae8e

improve selection and remove process ocr workflow tab

464b861

add comment and remove unused file

c450ad3

improve messages

ff53014

add config parameter

18ff2d8

initial commit of ocr workflow entity

7055ef3

add entities and db object for ocr workflow

7380441

add ocr workflow edit

b57a3f2

add bean to hibernate

319727e

select ocr workflow on project level and fix delete and editing of oc…

bfd037e

…r workflow

create ocr workflow file in process dir

3b23eb9

add config parameter

305bb06

initial commit of ocr workflow entity

d3e53ca

add entities and db object for ocr workflow

980cfa0

add ocr workflow edit

0a3ca19

add ocr workflwo tab to process edit

8197c62

improve selection and remove process ocr workflow tab

ac4883e

henning-gerhardt removed their request for review June 27, 2023 15:18

markusweigelt added 14 commits June 28, 2023 14:57

Rename OCR-D Workflow to OCR Profile

01d275d

Rename OCR-D Workflow to OCR Profile

347f6ee

Rename OCR-D Workflow to OCR Profile

8e389d4

Improve checkstyle

67e5ecc

Fixes regarding renaming to OCR profiles

453ab22

Change first letter of converter reference

ecbae9e

Rename variable ocrprofile to ocrprofilefilename

ffc58d1

Improve checkstyle

e76a770

Add named parameter

aa4ac13

Improve authorities and support relativ path to ocr profile

42480ff

Improve java doc

93f6aa9

Allow symlinks in file path

08c746f

Add file separator

587aa3e

Add project dms export path

5a90f0d

solth added the feature label Jul 8, 2023

matthias-ronge reviewed Jul 14, 2023

View reviewed changes

markusweigelt closed this Aug 1, 2023

markusweigelt reopened this Aug 1, 2023

markusweigelt mentioned this pull request Oct 11, 2023

OCR-D Workflow integration (without entity implementation) #5809

Merged

markusweigelt closed this Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR-D Workflow integration #5697

OCR-D Workflow integration #5697

markusweigelt commented Jun 13, 2023 •

edited

Loading

bertsky commented Jun 27, 2023 •

edited

Loading

henning-gerhardt commented Jun 27, 2023

Erikmitk commented Jun 27, 2023

markusweigelt commented Jun 27, 2023

matthias-ronge left a comment

BartChris commented Jul 20, 2023 •

edited

Loading

Erikmitk commented Aug 7, 2023 •

edited

Loading

BartChris commented Aug 7, 2023 •

edited

Loading

bertsky commented Aug 14, 2023

markusweigelt commented Oct 11, 2023

OCR-D Workflow integration #5697

OCR-D Workflow integration #5697

Conversation

markusweigelt commented Jun 13, 2023 • edited Loading

bertsky commented Jun 27, 2023 • edited Loading

henning-gerhardt commented Jun 27, 2023

Erikmitk commented Jun 27, 2023

markusweigelt commented Jun 27, 2023

matthias-ronge left a comment

Choose a reason for hiding this comment

BartChris commented Jul 20, 2023 • edited Loading

Erikmitk commented Aug 7, 2023 • edited Loading

BartChris commented Aug 7, 2023 • edited Loading

bertsky commented Aug 14, 2023

markusweigelt commented Oct 11, 2023

markusweigelt commented Jun 13, 2023 •

edited

Loading

bertsky commented Jun 27, 2023 •

edited

Loading

BartChris commented Jul 20, 2023 •

edited

Loading

Erikmitk commented Aug 7, 2023 •

edited

Loading

BartChris commented Aug 7, 2023 •

edited

Loading