Region-based Layout Analysis of Music Score Images

This repository corresponds to the paper: Region-based Layout Analysis of Music Score Images, submitted to the journal 'Expert Systems to Applications'.

BibTex Reference

@article{castellanos2022musicLA,
  author    = {Francisco J. Castellanos and
               Carlos Garrido{-}Munoz and
               Antonio R{\'{\i}}os{-}Vila and
               Jorge Calvo{-}Zaragoza},
  title     = {Region-based Layout Analysis of Music Score Images},
  journal   = {CoRR},
  volume    = {abs/2201.04214},
  year      = {2022},
  eprinttype = {arXiv},
  eprint    = {2201.04214}
}

SAE

The SAE folder includes the code that implements the Selectional Auto-encoder. This program has a series of parameters, as described below:

--mode : Execution mode (train or test).
--gpu : Identifier of GPU.
--cmode : Color mode (0 for grayscale, 1 for RGB).
--s-width : Width size of the rescaled image.
--s-height : Height size of the rescaled image.
--k-height : Kernel height.
--k-width : Kernel width.
--nfilt : Number of filters to configure the convolutional layers.
--batch : Batch size.
--norm : Type of image normalization.
--epochs : Maximum number of epochs.
--nbl : Number of blocks in the encoder and decoder of the SAE model.
--img : It activates the mode for saving images to check the evolution of the training process.
--graph : It activates the mode to save the model graph.
--post : It activates a post-processing filter for improving the recognition.
--th : IoU threshold to compute metrics.
--nimgs : Number of images considered from the training set.
--red : Reduction factor to reduce vertically the regions before training.
--labels : Name of the labels to be used for training. This code uses the data configuration provided by the MuReT tool.

Example of use:

python -u main.py ${model}
      --db-train dataset/train.txt
      --db-val dataset/val.txt
      --db-test dataset/test.txt
      --mode train
      --gpu 0
      --cmode 1
      --s-width 512
      --s-height 512
      --k-height 3
      --k-width 3
      --nfilt 128
      --batch 16
      --norm inv255
      --epochs 300
      --nbl 3
      --nimgs 32
      --labels staff
      --labels lyrics
      --th 0.55
      --red 0.2

Data augmentation

This folder contains the code for generating the augmented images used in the paper. There is an example of use in the script:

Parameters of this code:

-type : It configures the manner of generating data. The used in the paper is "random-auto".
-n : Number of new semi-synthetic images.
-txt_train : Path to the folder that contains the json files with the ground-truth data. This Ground-truth data has been generated through the MuReT tool.
-pages : Number of pages to be considered as real available pages.
--uniform_rotate : It activates the mode for keeping a uniform rotation for all regions within the same page.

data_aug/generate_daug_all.sh

Another example of use:

python3 -u ./main.py \
      -type ${type} \
      -n 100 \
      -txt_train dataset/json_files.json \
      -pages 10 \
      --uniform_rotate

End-to-end

The code for the end-to-end approach used in this work can be found in end-to-end code

Training of the models

The pytorch_models folder includes the code that implements the models used described in the paper. This program has a series of parameters, as described below:

Training without data augmentation

./train.sh trains

Faster-RCNN w. Resnet50
RetinaNet w. ResNet50
Faster R-CNN with MobileNet v3
SSD w. VGG16 backbone

without the custom data augmentation used in the paper.

Training with data augmentation

./train-aug.sh trains

Faster-RCNN w. Resnet50
RetinaNet w. ResNet50
Faster R-CNN with MobileNet v3
SSD w. VGG16 backbone

using the data augmentation used in the paper.

Testing without data augmentation

./test.sh tests the following models:

Faster-RCNN w. Resnet50
RetinaNet w. ResNet50
Faster R-CNN with MobileNet v3
SSD w. VGG16 backbone

without the custom data augmentation used in the paper.

Testing with data augmentation

./test_aug.sh tests the following models:

Faster-RCNN w. Resnet50
RetinaNet w. ResNet50
Faster R-CNN with MobileNet v3
SSD w. VGG16 backbone

using the data augmentation used in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
SAE		SAE
data_aug		data_aug
pytorch_models		pytorch_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Region-based Layout Analysis of Music Score Images

BibTex Reference

SAE

Data augmentation

End-to-end

Training of the models

Training without data augmentation

Training with data augmentation

Testing without data augmentation

Testing with data augmentation

About

Releases

Packages

Contributors 2

Languages

License

fjcastellanos/music_region_layout_analysis

Folders and files

Latest commit

History

Repository files navigation

Region-based Layout Analysis of Music Score Images

BibTex Reference

SAE

Data augmentation

End-to-end

Training of the models

Training without data augmentation

Training with data augmentation

Testing without data augmentation

Testing with data augmentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages