Skip to content

metabrainz/caa-backup

Cover Art Archive Backup

This code was mostly generated by Gemini AI. Good job on such a simple and tedious task, buddy!

This projects exists to backup the Cover Art Archive's original sized pieces of cover art.

Getting Started

Create a virtual env and install the python3 pre-requisites:

python -mvenv .ve
source .ve/bin/activate
pip install -r requirements.txt

Now copy dot-env-sample to .env:

cp dot-env-sample .env

Then edit .env according to your needs:

  • PG_CONN_STRING -- the postgres connection string for access to a MusicBrainz database
  • DB_PATH="caa_backup.db" -- the location where to store the local database file to keep track of progress.
  • BACKUP_DIR="caa-backup" -- the cache directory where to store the downloaded files
  • DOWNLOAD_THREADS=12 -- the number of threads to use for simulteanous downloads

Usage with manage.py (Recommended)

The easiest way to use this system is through the manage.py script, which provides a unified interface:

# Activate your virtual environment first
source .ve/bin/activate

# View all available commands
python manage.py --help

# Check system status and configuration
python manage.py status

# Import data from PostgreSQL (first run)
python manage.py import-data

# Download cover art images (this will take DAYS or WEEKS!)
python manage.py download

# Verify local cache against database
python manage.py verify

# Start standalone monitoring server
python manage.py monitor --port 8080

Command Options

  • import-data: Import from PostgreSQL to SQLite

    • --batch-size INTEGER: Records per batch (default: 1000)
    • --force: Overwrite existing database
    • --incremental: Import only new records since last import
  • download: Download cover art images

    • --threads INTEGER: Download threads (default: 8)
    • --batch-size INTEGER: Records per batch (default: 1000)
    • --monitor-port INTEGER: Monitoring port (default: 8080)
  • verify: Verify cache against database

  • monitor: Standalone monitoring server

    • --port INTEGER: Server port (default: 8080)
    • --host TEXT: Server host (default: localhost)
  • status: Display system status and statistics

First run

  1. Import the database: python manage.py import-data
  2. Download images: python manage.py download

Subsequent runs

To keep the backup up-to-date:

  1. Update with new records: python manage.py import-data --incremental
  2. Verify existing files: python manage.py verify
  3. Download new files: python manage.py download

Alternatively, for a complete refresh:

  1. Re-import all data: python manage.py import-data --force
  2. Verify existing files: python manage.py verify
  3. Download new files: python manage.py download

Manual Usage (Individual Scripts)

You can still run the individual scripts directly:

Run caa_importer.py to download the cover_art_archive.cover_art table into SQLite.

Then, run caa_downloader.py to download the cover art images. This is going to take DAYS, if not WEEKS!

To keep the backup up-to-date run the caa_importer.py script again to re-download all the CAA data, then re-run caa_verify.py to mark all the downloaded files as downloaded. Finally, run caa_downloader.py again to download any new files that were added since the last run.

About

Script to create a copy of the CAA

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages