The models were generated by the Training model for Patchwise Analysis of Music Document – HPC
job,
using the images provided in the training_data
folder. For processing a music document with staff lines and text,
only three models are necessary: one for music symbols, one for staff lines, and one for background (which will include the text).
We used the default values of the settings of this training job:
maximum number of samples per label = 10,000
epochs = 15
patch height = 256
patch width = 256
batch size = 16
- We used the maximum amount of memory allowed:
257 GB
. However, we recommend against this. You should try to use the least amount of memory that allows the job to finish (hint: this value is probably around90GB
or higher). Normally, you won't need more than150 GB
of memory for procesing a maximum of15k samples
per layer for3 layers
. - It normally takes a bit more than 4 hours to train, so the default setting of
6 hours
is fine.
As indicated in the End-to-End OMR Documentation - Hints section:
The document-analysis classifier and trainer jobs are sensitive to the size of images. For music with staves, the distance between staff lines in pixels (staff size height) tends to be a predictor of how well it will perform. For instance, with the original CDN-Hsmu M2149.L4 images, the staff size height is 64 px. Values around this point may result in better classification and training results, but note that this is not an optimized measure.
For better classification results, the size of the images used for training data was modified to have a staff size height closer to 64 px. For the Salzinnes Antiphonal (CDN-Hsmu M2149.L4), this implied to reduce the original image size by 63%. This should be kept in mind at the moment of processing the complete manuscript through the OMR workflow. All images to be processed should be resized by this factor as well (for this manuscript). This is done in the Resize Image job at the beginning of the end-to-end OMR workflow (as indicated in this image).
The folios randomly selected for the training data are: 2v, 42v, and 45r. These were reduced in size according to the previous ratio (0.63) and the set of images were combined into a big file including all three of them. The same was done to the set of layers generated in Pixel. The combined layers can be found in the training_data
folder. The combined images can be generated again by retrieving the original images from the IIIF Manifest, reducing their size by 0.63, and combining them into one file (in ascending order—2v, 42v, and 45r) by using Image Magick as indicated in the tutorial section for Image Layering.