-
Notifications
You must be signed in to change notification settings - Fork 6
1.0 (Brief) Introduction to Medieval Chant OMR
Note
This page contains a broad overview of the complete OMR process and a general description of each job, how they work, and how they connect to each other in the grand scheme of things. For detailed instructions on how to run an OMR process, you can skip directly to section 2.0.
Optical Music Recognition (OMR) is the process through which a computer reads an image of a musical score and transcribes into a machine-readable format called MEI. Chant OMR is this same process applied to Medieval manuscripts, wherein the computer is given the image of a folio and transcribes it into modern notation. For this to be possible, the transcription process is broken up into individual, precise steps for the computer to go through. At the beginning of the process, humans must teach the computer how to accomplish each of these tasks. Once enough examples have been given, the computer can then accomplish each task on its own and perform the entire transcription from start to finish without human intervention. The diagram below shows the broad steps by which the computer is taught to read a folio image; steps with an icon of a person next to them are steps where human intervention is initially required.
Rodan is a software that contains each of these tasks; it allows the user to interact with the computer to teach it each step, as well as to string the steps together once the training is complete. The current documentation goes over each step of the complete OMR process, from training the computer to running a complete workflow to get the computer's transcription of a manuscript folio image.
Pixel.js is a web browser-based graphical interface for separating pixels of a music score image into layers for OMR (including an automatically generated background layer). In other words, Pixel is used to teach the computer to extract and separate all the elements of a folio image into three categories: staff lines, symbols, and text. Each category is termed a "layer." All the symbols on the folio, including clefs and custodes, will be extracted and put into the symbol layer; the staff lines will be extracted and put into the staff line layer; and the text will be extracted and put into the text layer. The Pixel job will allow the human to provide the computer with correctly separated layers.
The Paco Trainer then generates neural network models to classify pixels into OMR-relevant layers. In other words, it uses the layers separated manually in Pixel to produce models that the computer will use to separate future folios into layers without human intervention.
The Interactive Classifier is a browser-based interactive graphical interface based on the Gamera job of the same name and is used to train a music symbol model. The job iterates between an automatic classification stage of glyphs and a manual correction stage where the user is able to specify more glyphs as training data. In other words, much like the Pixel job is used for providing the computer with correctly separated layers, the Interactive Classifier job is used for providing the computer with correctly identified symbols. Since the musical scores we work with are manuscript, the appearance of standard symbols, like a clef or a punctum, changes from source to source; the computer must be taught what each glyph looks like in the given source. It can then use its bank of human-identified symbols to classify more symbols from the same manuscript without human intervention.
The e2e OMR process is the final step, where the full transcription actually happens. Using the training previously accomplished with the Pixel and IC jobs, the computer can read and transcribe a manuscript folio from start to finish without human intervention. Thanks to Rodan, all the jobs needed for the computer to accomplish this feat can be strung together in one giant workflow, like so:
The branch on the left is very similar to the Interactive Classifier workflow, with the difference that the Interactive Classifier (IC) job has been replaced with the Non-Interactive Classifier (NIC) job. The NIC accomplishes the same task as the IC but does not allow for human input; once the initial round of classifying is done, that information is sent directly to the next job. Any remaining errors will be corrected at the very end of the process, using Neon.
The middle branch has to do with the staff lines. This is where the staff lines are extracted from layer 2. Their exact position on the page is then analyzed and combined with the exact position of all the glyphs that the NIC has freshly identified, so that each glyph may be assigned a pitch.
The final branch on the right is for text alignment and syllabification. The text is compared with the chant text provided by the user. Each word is then identified and syllabified. Those syllables are then assigned to the neumes above them, which have now been pitched.
All that information is then combined in the MEI Encoding job, which converts it to MEI format and produces an MEI file containing the computer's transcription of the given folio.