Skip to content

This repository corresponds to the paper: Region-based Layout Analysis of Music Score Images

License

Notifications You must be signed in to change notification settings

fjcastellanos/music_region_layout_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Region-based Layout Analysis of Music Score Images

This repository corresponds to the paper: Region-based Layout Analysis of Music Score Images, submitted to the journal 'Expert Systems to Applications'.

BibTex Reference

@article{castellanos2022musicLA,
  author    = {Francisco J. Castellanos and
               Carlos Garrido{-}Munoz and
               Antonio R{\'{\i}}os{-}Vila and
               Jorge Calvo{-}Zaragoza},
  title     = {Region-based Layout Analysis of Music Score Images},
  journal   = {CoRR},
  volume    = {abs/2201.04214},
  year      = {2022},
  eprinttype = {arXiv},
  eprint    = {2201.04214}
}

SAE

The SAE folder includes the code that implements the Selectional Auto-encoder. This program has a series of parameters, as described below:

--mode : Execution mode (train or test).
--gpu : Identifier of GPU.
--cmode : Color mode (0 for grayscale, 1 for RGB).
--s-width : Width size of the rescaled image.
--s-height : Height size of the rescaled image.
--k-height : Kernel height.
--k-width : Kernel width.
--nfilt : Number of filters to configure the convolutional layers.
--batch : Batch size.
--norm : Type of image normalization.
--epochs : Maximum number of epochs.
--nbl : Number of blocks in the encoder and decoder of the SAE model.
--img : It activates the mode for saving images to check the evolution of the training process.
--graph : It activates the mode to save the model graph.
--post : It activates a post-processing filter for improving the recognition.
--th : IoU threshold to compute metrics.
--nimgs : Number of images considered from the training set.
--red : Reduction factor to reduce vertically the regions before training.
--labels : Name of the labels to be used for training. This code uses the data configuration provided by the MuReT tool.

Example of use:

python -u main.py ${model}
      --db-train dataset/train.txt
      --db-val dataset/val.txt
      --db-test dataset/test.txt
      --mode train
      --gpu 0
      --cmode 1
      --s-width 512
      --s-height 512
      --k-height 3
      --k-width 3
      --nfilt 128
      --batch 16
      --norm inv255
      --epochs 300
      --nbl 3
      --nimgs 32
      --labels staff
      --labels lyrics
      --th 0.55
      --red 0.2

Data augmentation

This folder contains the code for generating the augmented images used in the paper. There is an example of use in the script:

Parameters of this code:

-type : It configures the manner of generating data. The used in the paper is "random-auto".
-n : Number of new semi-synthetic images.
-txt_train : Path to the folder that contains the json files with the ground-truth data. This Ground-truth data has been generated through the MuReT tool.
-pages : Number of pages to be considered as real available pages.
--uniform_rotate : It activates the mode for keeping a uniform rotation for all regions within the same page.

data_aug/generate_daug_all.sh

Another example of use:

python3 -u ./main.py \
      -type ${type} \
      -n 100 \
      -txt_train dataset/json_files.json \
      -pages 10 \
      --uniform_rotate

End-to-end

The code for the end-to-end approach used in this work can be found in end-to-end code

Training of the models

The pytorch_models folder includes the code that implements the models used described in the paper. This program has a series of parameters, as described below:

Training without data augmentation

./train.sh trains

  • Faster-RCNN w. Resnet50
  • RetinaNet w. ResNet50
  • Faster R-CNN with MobileNet v3
  • SSD w. VGG16 backbone

without the custom data augmentation used in the paper.

Training with data augmentation

./train-aug.sh trains

  • Faster-RCNN w. Resnet50
  • RetinaNet w. ResNet50
  • Faster R-CNN with MobileNet v3
  • SSD w. VGG16 backbone

using the data augmentation used in the paper.

Testing without data augmentation

./test.sh tests the following models:

  • Faster-RCNN w. Resnet50
  • RetinaNet w. ResNet50
  • Faster R-CNN with MobileNet v3
  • SSD w. VGG16 backbone

without the custom data augmentation used in the paper.

Testing with data augmentation

./test_aug.sh tests the following models:

  • Faster-RCNN w. Resnet50
  • RetinaNet w. ResNet50
  • Faster R-CNN with MobileNet v3
  • SSD w. VGG16 backbone

using the data augmentation used in the paper.

About

This repository corresponds to the paper: Region-based Layout Analysis of Music Score Images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published