songbeamer-duplications

This is a small utitlity that helps to organize your songbeamer song archive. This app detects similar or almost identical songs by training a doc2vec model. It also provides a graphical interface to delete/remove songs that are identical from your archive. Please note that this gui isn't designed to be beautiful and provides you only with a barebone functionality.

quick start

install python on your system 🐍
(optional) create a virtual environment

python -m venv venv

install all requirements

pip install -r requirements.txt

train the doc2vec model

$ python train.py --help
usage: Train doc2vec model from songbeamer files [-h] [--output_file OUTPUT_FILE] [--vector_size VECTOR_SIZE] [--epochs EPOCHS]
                                                 [--pick_probability PICK_PROBABILITY]
                                                 training_dir

positional arguments:
  training_dir          directory where the training data lives

options:
  -h, --help            show this help message and exit
  --output_file OUTPUT_FILE
                        directory where the trained model lives
  --vector_size VECTOR_SIZE
  --epochs EPOCHS
  --pick_probability PICK_PROBABILITY

find similar songs

$ python find.py --help
usage: Find similar songs from in a directory [-h] [--model_file MODEL_FILE] [--output_file OUTPUT_FILE] [--min MIN] dir

positional arguments:
  dir                   directory where the songs lives

options:
  -h, --help            show this help message and exit
  --model_file MODEL_FILE
                        file where the model is stored
  --output_file OUTPUT_FILE
                        file where all duplicate songs are stored in json format
  --min MIN             minimum similarity to be found in percent

start the gui

$ python gui.py --help
usage: GUI for better accessibility [-h] [--data_file DATA_FILE]

options:
  -h, --help            show this help message and exit
  --data_file DATA_FILE
                        directory where the trained model lives

tipps and tricks

No songs are going to be deleted! Instead they will be safed in a backup directory, that is located in the working directory where you started the program.
The best model was established by using 60-70 Epochs, 300 Vector size and pick percentage of 3,5% for a big archive.
Don't move the folder with the song archive after finding the duplicate songs because it uses full paths instead of relative paths. This is caused by a much easier program logic.
be happy 🥳

credits

gensim ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
delete_same_corpus.py		delete_same_corpus.py
find.py		find.py
gui.py		gui.py
requirements.txt		requirements.txt
similar.py		similar.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

songbeamer-duplications

quick start

tipps and tricks

credits

About

Languages

jolsfd/songbeamer-duplications

Folders and files

Latest commit

History

Repository files navigation

songbeamer-duplications

quick start

tipps and tricks

credits

About

Topics

Resources

Stars

Watchers

Forks

Languages