Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 3 KB

File metadata and controls

23 lines (18 loc) · 3 KB

Models

The models were generated by the Training model for Patchwise Analysis of Music Document – HPC job, using the images provided in the training_data folder. For processing a music document with staff lines and text, only three models are necessary: one for music symbols, one for staff lines, and one for background (which will include the text).

We used the default values of the settings of this training job:

  • maximum number of samples per label = 10,000
  • epochs = 15
  • patch height = 256
  • patch width = 256
  • batch size = 16
  • We used the maximum amount of memory allowed: 257 GB. However, we recommend against this. You should try to use the least amount of memory that allows the job to finish (hint: this value is probably around 90GB or higher). Normally, you won't need more than 150 GB of memory for procesing a maximum of 15k samples per layer for 3 layers.
  • It normally takes a bit more than 4 hours to train, so the default setting of 6 hours is fine.

Training Data

As indicated in the End-to-End OMR Documentation - Hints section:

The document-analysis classifier and trainer jobs are sensitive to the size of images. For music with staves, the distance between staff lines in pixels (staff size height) tends to be a predictor of how well it will perform. For instance, with the original CDN-Hsmu M2149.L4 images, the staff size height is 64 px. Values around this point may result in better classification and training results, but note that this is not an optimized measure.

For better classification results, the size of the images used for training data was modified to have a staff size height closer to 64 px. For the Salzinnes Antiphonal (CDN-Hsmu M2149.L4), this implied to reduce the original image size by 63%. This should be kept in mind at the moment of processing the complete manuscript through the OMR workflow. All images to be processed should be resized by this factor as well (for this manuscript). This is done in the Resize Image job at the beginning of the end-to-end OMR workflow (as indicated in this image).

The folios randomly selected for the training data are: 2v, 42v, and 45r. These were reduced in size according to the previous ratio (0.63) and the set of images were combined into a big file including all three of them. The same was done to the set of layers generated in Pixel. The combined layers can be found in the training_data folder. The combined images can be generated again by retrieving the original images from the IIIF Manifest, reducing their size by 0.63, and combining them into one file (in ascending order—2v, 42v, and 45r) by using Image Magick as indicated in the tutorial section for Image Layering.