Update notes on Kitodo.Production 3.x

Overview

Estimation of the time required
Requirements
Preparatory steps
- Convert XStream files
- Move metadata in newspaper processes
Update the system
Application migration

Estimates of the time needed

The following values are rough summaries of experienced periods of time that were obtained under various technical circumstances and are intended to give you a vague idea of how long the technical processes of the migration may take. However, your own experiences may differ.

🕑 Database migration: ≈ 1 minute per 1000 processes
🕑 Convert metadata: ≈ 20 min. / 1000 processes (different reports talk about up to 2 hours / 1000 processes)
🕑 Indexing: ≈ 20 min. / 1000 processes
🕑 Hierarchy migration: ≈ 1 hour / 1000 processes
🕑 Generate images: 1000 images / hour and thread
🕑 Newspaper migration: no forecast known

Requirements

These instructions assume that you have migrated your system to version 2.3 first.

Preparatory steps

Convert XStream files

Production version 2.3 supported two different internal file formats: METS and XStream. Version 3.2 only supports METS. Therefore, you need to use your version 2.3 to convert all internal XStream files to METS files beforehand.

To do this, go to the project settings and change the internal format setting in METS. Then select all the tasks of this project and run KitodoScript action:rewriteProcessMetadata on them.

Move metadata in newspaper processes

In several institutions, when newspaper processes were created in Production 2, metadata that are relevant for the issues were stored on the top level (in the topstruct, e.g. Newspaper) or on the first sub-level (in the firstchild, that is on the annual level, e.g. NewspaperYear) and are copied to the right place using copyData expressions when exporting the processes. copyData expressions are no longer available in this form in Production 3. Therefore, if you are concerned, you have to move the metadata to the desired position within the newspaper processes in Production 2 before the migration with the KitodoScript action:copyData.

The action:copyData syntax differs slightly from the other KitodoScript actions. The rules are specified directly after the command without a prefix and without enclosing quotation marks. Example:

action:copyData /PublicationYear[0]/PublicationMonth[*]/PublicationDay[*]/Issue@DatePublished \=format "%1$04d-%2$02d-%3$02d" #1@TitleDocMain #2@TitleDocMainShort #3@TitleDocMainShort

Update the system

Create rulesets

In version 3, the ruleset file format was changed. You should study the new format intensively and create a new ruleset for your data. If you want, you can use an experimental ruleset converter provided by Zeutschel GmbH to generate a suggestion. Note that the output of this program is not perfect and will need to be checked and corrected manually later. (However, you save yourself a lot of paperwork.)

ℹ️ In version 3.2, import and export are no longer defined in the ruleset, but in XSLT files, which offer a certain degree of flexibility. XSLT is a programming language written in XML and used to convert XML into other forms of representation (e.g. HTML). You need a thorough understanding of XSLT to properly configure the import and export.

Migrate database

⚠️ Before changing the database, make a backup copy and stop the running web application!

Before running the migration script, make sure that your database schema matches the standard version 2.3 schema.

Here are some scripting solutions to the following inconsistencies:

If your database still contains bit fields, you should convert them to tinyint(1).

ALTER TABLE `benutzer` CHANGE `IstAktiv` `IstAktiv` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `benutzer` CHANGE `mitMassendownload` `mitMassendownload` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `benutzer` CHANGE `confVorgangsdatumAnzeigen` `confVorgangsdatumAnzeigen` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `benutzereigenschaften` CHANGE `IstObligatorisch` `IstObligatorisch` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `metadatenkonfigurationen` CHANGE `orderMetadataByRuleset` `orderMetadataByRuleset` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `projekte` CHANGE `useDmsImport` `useDmsImport` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `projekte` CHANGE `dmsImportCreateProcessFolder` `dmsImportCreateProcessFolder` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `projekte` CHANGE `projectIsArchived` `projectIsArchived` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `prozesse` CHANGE `IstTemplate` `IstTemplate` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `prozesse` CHANGE `swappedOut` `swappedOut` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `prozesse` CHANGE `inAuswahllisteAnzeigen` `inAuswahllisteAnzeigen` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `prozesseeigenschaften` CHANGE `IstObligatorisch` `IstObligatorisch` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typMetadaten` `typMetadaten` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typAutomatisch` `typAutomatisch` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typImportFileUpload` `typImportFileUpload` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typExportRus` `typExportRus` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typImagesLesen` `typImagesLesen` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typImagesSchreiben` `typImagesSchreiben` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typExportDMS` `typExportDMS` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typBeimAnnehmenModul` `typBeimAnnehmenModul` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typBeimAnnehmenAbschliessen` `typBeimAnnehmenAbschliessen` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typBeimAnnehmenModulUndAbschliessen` `typBeimAnnehmenModulUndAbschliessen` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typBeimAbschliessenVerifizieren` `typBeimAbschliessenVerifizieren` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `typScriptStep` `typScriptStep` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `schritte` CHANGE `batchStep` `batchStep` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `vorlageneigenschaften` CHANGE `IstObligatorisch` `IstObligatorisch` TINYINT(1) NULL DEFAULT NULL;
ALTER TABLE `werkstueckeeigenschaften` CHANGE `IstObligatorisch` `IstObligatorisch` TINYINT(1) NULL DEFAULT NULL;

ALTER TABLE `prozesse` CHANGE `wikifield` `wikifield` LONGTEXT NULL DEFAULT NULL;
ALTER TABLE `prozesseeigenschaften` CHANGE `Wert` `Wert` LONGTEXT NULL DEFAULT NULL;
ALTER TABLE `vorlageneigenschaften` CHANGE `Wert` `Wert` LONGTEXT NULL DEFAULT NULL;
ALTER TABLE `werkstueckeeigenschaften` CHANGE `Wert` `Wert` LONGTEXT NULL DEFAULT NULL;

If any of these Boolean fields are not initialized, you should initialize them

UPDATE benutzer SET IstAktiv = 0 WHERE IstAktiv IS NULL OR IstAktiv != 1;
UPDATE benutzer SET mitMassendownload = 0 WHERE mitMassendownload IS NULL OR mitMassendownload != 1;
UPDATE benutzer SET confVorgangsdatumAnzeigen = 0 WHERE confVorgangsdatumAnzeigen IS NULL OR confVorgangsdatumAnzeigen != 1;
UPDATE benutzereigenschaften SET IstObligatorisch = 0 WHERE IstObligatorisch IS NULL OR IstObligatorisch != 1;
UPDATE metadatenkonfigurationen SET orderMetadataByRuleset = 0 WHERE orderMetadataByRuleset IS NULL OR orderMetadataByRuleset != 1;
UPDATE projectfilegroups SET previewImage = 0 WHERE previewImage IS NULL OR previewImage != 1;
UPDATE projekte SET useDmsImport = 0 WHERE useDmsImport IS NULL OR useDmsImport != 1;
UPDATE projekte SET dmsImportCreateProcessFolder = 0 WHERE dmsImportCreateProcessFolder IS NULL OR dmsImportCreateProcessFolder != 1;
UPDATE projekte SET projectIsArchived = 0 WHERE projectIsArchived IS NULL OR projectIsArchived != 1;
UPDATE prozesse SET IstTemplate = 0 WHERE IstTemplate IS NULL OR IstTemplate != 1;
UPDATE prozesse SET swappedOut = 0 WHERE swappedOut IS NULL OR swappedOut != 1;
UPDATE prozesse SET inAuswahllisteAnzeigen = 0 WHERE inAuswahllisteAnzeigen IS NULL OR inAuswahllisteAnzeigen != 1;
UPDATE prozesseeigenschaften SET IstObligatorisch = 0 WHERE IstObligatorisch IS NULL OR IstObligatorisch != 1;
UPDATE schritte SET typMetadaten = 0 WHERE typMetadaten IS NULL OR typMetadaten != 1;
UPDATE schritte SET typAutomatisch = 0 WHERE typAutomatisch IS NULL OR typAutomatisch != 1;
UPDATE schritte SET typAutomatisch = 0 WHERE typAutomatisch IS NULL OR typAutomatisch != 1;
UPDATE schritte SET typImportFileUpload = 0 WHERE typImportFileUpload IS NULL OR typImportFileUpload != 1;
UPDATE schritte SET typExportRus = 0 WHERE typExportRus IS NULL OR typExportRus != 1;
UPDATE schritte SET typImagesLesen = 0 WHERE typImagesLesen IS NULL OR typImagesLesen != 1;
UPDATE schritte SET typImagesSchreiben = 0 WHERE typImagesSchreiben IS NULL OR typImagesSchreiben != 1;
UPDATE schritte SET typExportDMS = 0 WHERE typExportDMS IS NULL OR typExportDMS != 1;
UPDATE schritte SET typBeimAnnehmenModul = 0 WHERE typBeimAnnehmenModul IS NULL OR typBeimAnnehmenModul != 1;
UPDATE schritte SET typBeimAnnehmenAbschliessen = 0 WHERE typBeimAnnehmenAbschliessen IS NULL OR typBeimAnnehmenAbschliessen != 1;
UPDATE schritte SET typBeimAnnehmenModulUndAbschliessen = 0 WHERE typBeimAnnehmenModulUndAbschliessen IS NULL OR typBeimAnnehmenModulUndAbschliessen != 1;
UPDATE schritte SET typBeimAbschliessenVerifizieren = 0 WHERE typBeimAbschliessenVerifizieren IS NULL OR typBeimAbschliessenVerifizieren != 1;
UPDATE schritte SET typScriptStep = 0 WHERE typScriptStep IS NULL OR typScriptStep != 1;
UPDATE schritte SET batchStep = 0 WHERE batchStep IS NULL OR batchStep != 1;
UPDATE vorlageneigenschaften SET IstObligatorisch = 0 WHERE IstObligatorisch IS NULL OR IstObligatorisch != 1;
UPDATE werkstueckeeigenschaften SET IstObligatorisch = 0 WHERE IstObligatorisch IS NULL OR IstObligatorisch != 1;

There shouldn't be any dead references. Delete orphaned records.

SET FOREIGN_KEY_CHECKS=0;
DELETE FROM batchesprozesse WHERE batchesprozesse.ProzesseID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
DELETE FROM batchesprozesse WHERE batchesprozesse.BatchID NOT IN (SELECT batches.BatchID FROM batches);
UPDATE benutzer SET benutzer.ldapgruppenID = NULL WHERE benutzer.ldapgruppenID NOT IN (SELECT ldapgruppen.ldapgruppenID FROM ldapgruppen);
DELETE FROM benutzereigenschaften WHERE benutzereigenschaften.BenutzerID NOT IN (SELECT benutzer.BenutzerID FROM benutzer);
DELETE FROM benutzergruppenmitgliedschaft WHERE benutzergruppenmitgliedschaft.BenutzerGruppenID NOT IN (SELECT benutzergruppen.BenutzergruppenID FROM benutzergruppen);
DELETE FROM benutzergruppenmitgliedschaft WHERE benutzergruppenmitgliedschaft.BenutzerID NOT IN (SELECT benutzer.BenutzerID FROM benutzer);
DELETE FROM history WHERE history.processID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
DELETE FROM projectfilegroups WHERE projectfilegroups.ProjekteID NOT IN (SELECT projekte.ProjekteID FROM projekte);
DELETE FROM projektbenutzer WHERE projektbenutzer.BenutzerID NOT IN (SELECT benutzer.BenutzerID FROM benutzer);
DELETE FROM projektbenutzer WHERE projektbenutzer.ProjekteID NOT IN (SELECT projekte.ProjekteID FROM projekte);
UPDATE prozesse SET prozesse.ProjekteID = NULL WHERE prozesse.ProjekteID NOT IN (SELECT projekte.ProjekteID FROM projekte);
UPDATE prozesse SET prozesse.MetadatenKonfigurationID = NULL WHERE prozesse.MetadatenKonfigurationID NOT IN (SELECT metadatenkonfigurationen.MetadatenKonfigurationID FROM metadatenkonfigurationen);
UPDATE prozesse SET prozesse.docketID = NULL WHERE prozesse.docketID NOT IN (SELECT dockets.docketID FROM dockets);
DELETE FROM prozesseeigenschaften WHERE prozesseeigenschaften.prozesseID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
DELETE FROM schritte WHERE schritte.ProzesseID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
UPDATE schritte SET schritte.BearbeitungsBenutzerID = NULL WHERE schritte.BearbeitungsBenutzerID NOT IN (SELECT benutzer.BenutzerID FROM benutzer);
DELETE FROM schritteberechtigtebenutzer WHERE schritteberechtigtebenutzer.BenutzerID NOT IN (SELECT benutzer.BenutzerID FROM benutzer);
DELETE FROM schritteberechtigtebenutzer WHERE schritteberechtigtebenutzer.schritteID NOT IN (SELECT schritte.SchritteID FROM schritte);
DELETE FROM schritteberechtigtegruppen WHERE schritteberechtigtegruppen.BenutzerGruppenID NOT IN (SELECT benutzergruppen.BenutzergruppenID FROM benutzergruppen);
DELETE FROM schritteberechtigtegruppen WHERE schritteberechtigtegruppen.schritteID NOT IN (SELECT schritte.SchritteID FROM schritte);
DELETE FROM vorlagen WHERE vorlagen.ProzesseID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
DELETE FROM vorlageneigenschaften WHERE vorlageneigenschaften.vorlagenID NOT IN (SELECT vorlagen.VorlagenID FROM vorlagen);
DELETE FROM werkstuecke WHERE werkstuecke.ProzesseID NOT IN (SELECT prozesse.ProzesseID FROM prozesse);
DELETE FROM werkstueckeeigenschaften WHERE werkstueckeeigenschaften.werkstueckeID NOT IN (SELECT werkstuecke.WerkstueckeID FROM werkstuecke);
SET FOREIGN_KEY_CHECKS=1;

The database module should be InnoDB.

ALTER TABLE batches ENGINE=innodb;
ALTER TABLE benutzer ENGINE=innodb;
ALTER TABLE benutzereigenschaften ENGINE=innodb;
ALTER TABLE benutzergruppen ENGINE=innodb;
ALTER TABLE benutzergruppenmitgliedschaft ENGINE=innodb;
ALTER TABLE dockets ENGINE=innodb;
ALTER TABLE history ENGINE=innodb;
ALTER TABLE ldapgruppen ENGINE=innodb;
ALTER TABLE metadatenkonfigurationen ENGINE=innodb;
ALTER TABLE projectfilegroups ENGINE=innodb;
ALTER TABLE projektbenutzer ENGINE=innodb;
ALTER TABLE projekte ENGINE=innodb;
ALTER TABLE prozesse ENGINE=innodb;
ALTER TABLE prozesseeigenschaften ENGINE=innodb;
ALTER TABLE schritte ENGINE=innodb;
ALTER TABLE schritteberechtigtebenutzer ENGINE=innodb;
ALTER TABLE schritteberechtigtegruppen ENGINE=innodb;
ALTER TABLE vorlagen ENGINE=innodb;
ALTER TABLE vorlageneigenschaften ENGINE=innodb;
ALTER TABLE werkstuecke ENGINE=innodb;
ALTER TABLE werkstueckeeigenschaften ENGINE=innodb;
ALTER TABLE batches ENGINE=innodb;
ALTER TABLE batches ENGINE=innodb;
ALTER TABLE batches ENGINE=innodb;
ALTER TABLE batches ENGINE=innodb;
ALTER TABLE batches ENGINE=innodb;

If your system is not subject to these inconsistencies, the scripts should not make any changes. You can then run the migration SQL script.

Install ElasticSearch

In version 3.2, Production is again supported by a search engine index. Since the search engine integration in Production 3.2 is based on a misunderstanding in using the search engine index that was removed in ElasticSearch version 6, you cannot use newer versions of the software. Make sure you install the latest release of version 5. You can use the package manager to do this:

export JAVA_HOME=/usr/lib/jvm/default-java
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo add-apt-repository "deb https://artifacts.elastic.co/packages/5.x/apt stable main"
sudo apt-get update
sudo apt-get install elasticsearch
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Install ImageMagick

Production 3 uses the ImageMagick program to generate various image file formats, e.g. to create images from TIFF files that can be displayed in the browser, since full support for TIFF files is no longer available in Java 8 and higher. You can install ImageMagick using the package manager:

sudo apt-get install imagemagick

⚠️ Make sure that the legacy convert program is installed. (This is still the normal case on Linux, but must be activated manually during installation on Windows.)

Update program files

Move the kitodo directory and kitodo.war file out of the webapps directory of Tomcat. Delete the kitodo directory in the work/Catalina/localhost directory of Tomcat.

ℹ️ Production 3.2 is delivered with modules, but these differ from both the modules from version 2.3 (which had to be configured in the modules.xml file) and the plug-ins. Neither is supported in Production 3.2 anymore. (You can delete the modules.xml file and plugins directory.) The modules in Production 3.2 are just part of the program, separated from the web application WAR file to reduce cross-references between different parts of the program.

Create a directory /usr/local/kitodo/modules and unzip the modules supplied with the version there.

Start Tomcat to unzip the web application. Then stop it again.

Check the settings in the file webapps/kitodo/WEB-INF/kitodo_config.properties and adjust them if necessary.

Create a folder /usr/local/kitodo/diagrams. This must be writable for the Tomcat.

Delete the folder messages.

Copy all XSLT scripts from webapps/kitodo/WEB-INF/classes/xslt/ into the xslt folder. Replace the existing rulesets with the newly created ones. Create an export XSLT file with the same name for each ruleset (only with the extension .xsl instead of .xml). You can use the example in ruleset_default.xsl as a guide.

Hinterlegen Sie die Dateien kitodo_opac.xml und kitodo_fileFormats.xml aus der Config-Zip-Datei im config-Ordner und die XSL-Dateien aus der Zip-Datei im xslt-Ordner.

Store the files kitodo_opac.xml and kitodo_fileFormats.xml from the config zip file in the config folder and the XSL files from the zip file in the xslt folder.

During the migration, we recommend increasing the log level to the trace setting. This results in a more helpful output during the migration. For example, change the file webapps/kitodo/WEB-INF/log4j2.xml as follows:

<Loggers>
    <!-- The loggers de.unigoettingen.sub.search.opac and org.goobi are no longer required -->
    <Logger name="org.kitodo" level="trace" additivity="false">
        <AppenderRef ref="LOGFILE" /> <!-- Remove the attribute level="error" here -->
    </Logger>
    <Root level="error">
        <AppenderRef ref="STDOUT"/>
        <AppenderRef ref="LOGFILE"/>
    </Root>
</Loggers>

Start Tomcat again.

Application migration

Log in to the user interface with a user with administrator rights.

ℹ️ Depending on the configuration of your user, you may have to grant the user additional permissions via the database. The permissions are managed as roles in the database, where each role is a collection of several individual permissions. There are a number of individual permissions in Production 3.2. You can assign roles to your user by adding references to the table user_x_role.

After logging in, you will be landed on the "Indexing" tab of the "System" configuration page and are confronted with the information that the index needs to be updated. That's right, but before indexing, we should convert your processes' METS files so that we can index them.

Convert metadata

Go to the Migration tab. Choose “Convert Metadata”, select all of the projects you want to migrate, and click “Convert Metadata”.

Build search engine index

Switch back to the “Indexing” tab. Create the index profile and start indexing. If indexing hangs, you may need to restart Tomcat, log in again, and restart indexing. Wait for the indexing to finish.

Generate image derivatives

To generate image derivations, first check your project configuration. There is a folder area. Make sure the source folder is configured correctly. Enable image generation for each folder whose content is to be generated.

First, you may need to create additional image folders if they don't already exist. Go to Processes and select all the processes whose images you want to generate. Run KitodoScript action:createFolders.

Then you can run the KitodoScript action:generateImages folders:all images:all. The images are generated asynchronously in the background. You can monitor the progress in the Task Manager.

Create workflows

The creation of workflows is an intellectual task that has to be handled by a specialist. You go to the system and choose the Migrate Workflows button. Select projects. The analysis is carried out. (This may take some time, depending on how many records need to be read from the database. Just wait or watch the log file to see the progress.) Then the workflows in descending order of occurrence are shown in the following list. You click the cross arrows to create a workflow (or to reuse an existing workflow).

When creating a workflow, you must assign roles to each workflow task. However, you cannot change the workflow of existing processes. Set the workflow to active and save.

Later, you need to create a production template from the workflow you created and then assign it. It can take a while. Wait for the contents of the window to hide (or look at the log file to see when the process is complete).

Create hierarchies

The hierarchy migration task runs automatically to create hierarchical processes.

ℹ️ Hierarchical processes are a new feature in Production 3 that adds another level of complexity. Processes can be linked to one another as parents - children in order to show hierarchical dependencies of catalog data sets such as series or multi-volume works. The superordinate data is only saved once in the superordinate unit.

Here, the migration has to read in all metadata from files of all processes in order to collect higher-level metadata of hierarchical parents, then create parents and create links in the database and files. This is complex and takes some time, but it runs fully automatically in the background.

Migrate newspaper histories

This is the same as for the hierarchy, but based on a batch from Production 2.3 for a newspaper, a hierarchy of the newspaper is created that is even more complex. This also happens in the background.