Skip to content

Latest commit

 

History

History

Einsiedeln

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Models

The models were generated by the Training model for Patchwise Analysis of Music Document – HPC job, using the images provided in the training_data folder. For processing a music document with staff lines and text, only three models are necessary: one for music symbols, one for staff lines, and one for background (which will include the text).

We used the default values of the settings of this training job:

  • maximum number of samples per label = 15,000
  • epochs = 20
  • early stop = 15
  • patch height = 256
  • patch width = 256
  • batch size = 16
  • We used 150 GB for the memory. You should try to use the least amount of memory that allows the job to finish. Normally, you won't need more than 150 GB of memory for procesing a maximum of 15k samples per layer for 3 layers. We still need to experiment to find out the minimum amount of memory needed.
  • The training was completed in less than 10 hours.

Training Data

As indicated in the End-to-End OMR Documentation - Hints section:

The document-analysis classifier and trainer jobs are sensitive to the size of images. For music with staves, the distance between staff lines in pixels (staff size height) tends to be a predictor of how well it will perform. For instance, with the original CDN-Hsmu M2149.L4 images, the staff size height is 64 px. Values around this point may result in better classification and training results, but note that this is not an optimized measure.

Normally, for better classification results, the size of the images used for training data is modified to have a staff size height closer to 64 px. For the CH-E 611 (Einsiedeln) manuscript, no resizing was needed since the images were small enough.

The folios randomly selected for the training data are: 32r and 263v. These two folios contained enough information to get 150k samples per layer for all three layers (staff lines, neumes, and background). This is because there are around 15 staves per page and each staff has a high density of neumes. Therefore, no more folios were needed for training (comparte this to the 3 folios needed for Salzinnes and the 9 folios needed for MS 73—which has very few symbols and staves per page).

The set of images were combined into a big file including the two of them. The same was done to the set of layers generated in Pixel. The combined layers can be found in the training_data folder. The combined images can be generated again by retrieving the original images from the IIIF Manifest and combining them into one file (in ascending order—32r and 263v) by using Image Magick as indicated in the tutorial section for Image Layering.