Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated github actions workflows #658

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
envfile: ".github/environment-ci.yml"

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: true

Expand All @@ -65,7 +65,7 @@ jobs:
esac

- name: Cache conda
uses: actions/cache@v3
uses: actions/cache@v4
env:
CACHE_NUMBER: 2
with:
Expand All @@ -77,7 +77,7 @@ jobs:
${{ runner.os }}-conda-

- name: Setup conda
uses: conda-incubator/setup-miniconda@v2
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Miniforge3
miniforge-version: latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
black:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: black on mirdata
uses: psf/black@stable
with:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/lint-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,18 @@ jobs:
channel-priority: "flexible"
envfile: ".github/environment-lint.yml"
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cache conda
uses: actions/cache@v3
uses: actions/cache@v4
env:
CACHE_NUMBER: 0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-${{ matrix.python-version }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles( matrix.envfile ) }}
- name: Install conda environmnent
uses: conda-incubator/setup-miniconda@v2
uses: conda-incubator/setup-miniconda@v3
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
Expand Down
3 changes: 1 addition & 2 deletions mirdata/annotations.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""mirdata annotation data types
"""
"""mirdata annotation data types"""

import logging
import re
Expand Down
3 changes: 1 addition & 2 deletions mirdata/core.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""Core mirdata classes
"""
"""Core mirdata classes"""

import json
import os
Expand Down
8 changes: 4 additions & 4 deletions mirdata/datasets/acousticbrainz_genre.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@

We provide four datasets containing genre and subgenre annotations extracted from four different online metadata sources:

- AllMusic and Discogs are based on editorial metadata databases maintained by music experts and enthusiasts. These sources
contain explicit genre/subgenre annotations of music releases (albums) following a predefined genre namespace and taxonomy.
- AllMusic and Discogs are based on editorial metadata databases maintained by music experts and enthusiasts. These sources
contain explicit genre/subgenre annotations of music releases (albums) following a predefined genre namespace and taxonomy.
We propagated release-level annotations to recordings (tracks) in AcousticBrainz to build the datasets.
- Lastfm and Tagtraum are based on collaborative music tagging platforms with large amounts of genre labels provided by their
- Lastfm and Tagtraum are based on collaborative music tagging platforms with large amounts of genre labels provided by their
users for music recordings (tracks). We have automatically inferred a genre/subgenre taxonomy and annotations from these labels.

For details on format and contents, please refer to the data webpage.
Expand All @@ -34,7 +34,7 @@
The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale.
20th International Society for Music Information Retrieval Conference (ISMIR 2019).

This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under
This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under
grant agreement No 688382 AudioCommons.

"""
Expand Down
4 changes: 2 additions & 2 deletions mirdata/datasets/ballroom.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@
**Acknowledgments and References:**

This dataset was created with the collaboration of experts in ballroom dance music. We extend our gratitude to those who contributed their knowledge and expertise to this project. For detailed information on the dataset and its creation, please refer to the associated research papers and documentation.

[1] Gouyon F., A. Klapuri, S. Dixon, M. Alonso, G. Tzanetakis, C. Uhle, and P. Cano. An experimental comparison of audio tempo induction algorithms. Transactions on Audio, Speech and Language Processing 14(5), pp.1832-1844, 2006.

[2] Böck, S., and M. Schedl. Enhanced beat tracking with context-aware neural networks. In Proceedings of the International Conference on Digital Audio Effects (DAFX), 2010.

[3] Dixon, S., F. Gouyon & G. Widmer. Towards Characterisation of Music via Rhythmic Patterns. In Proceedings of the 5th International Society for Music Information Retrieval Conference (ISMIR). 2004.
"""

Expand Down
8 changes: 4 additions & 4 deletions mirdata/datasets/candombe.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
.. admonition:: Dataset Info
:class: dropdown

This is a dataset of Candombe recordings with annotated beats and downbeats, totaling over 2 hours of audio.
It comprises 35 complete performances by renowned players, in groups of three to five drums.
This is a dataset of Candombe recordings with annotated beats and downbeats, totaling over 2 hours of audio.
It comprises 35 complete performances by renowned players, in groups of three to five drums.
Recording sessions were conducted in studio, in the context of musicological research over the past two decades.
A total of 26 tambor players took part, belonging to different generations and representing all the important traditional Candombe styles.
The audio files are stereo with a sampling rate of 44.1 kHz and 16-bit precision.
The audio files are stereo with a sampling rate of 44.1 kHz and 16-bit precision.
The location of beats and downbeats was annotated by an expert, adding to more than 4700 downbeats.

The audio is provided as .flac files and the annotations as .csv files.
The audio is provided as .flac files and the annotations as .csv files.
The values in the first column of the csv file are the time instants of the beats.
The numbers on the second column indicate both the bar number and the beat number within the bar.
For instance, 1.1, 1.2, 1.3 and 1.4 are the four beats of the first bar. Hence, each label ending with .1 indicates a downbeat.
Expand Down
6 changes: 3 additions & 3 deletions mirdata/datasets/cipi.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@

The "Can I Play It?" (CIPI) dataset is a specialized collection of 652 classical piano scores, provided in a
machine-readable MusicXML format and accompanied by integer-based difficulty levels ranging from 1 to 9, as
verified by expert pianists. Then, it provides embeddings for fingering and expresiveness of the piece. Each
verified by expert pianists. Then, it provides embeddings for fingering and expresiveness of the piece. Each
recording has multiple scores corresponding to it. This dataset focuses exclusively on classical piano music,
offering a rich resource for music researchers, educators, and students. Developed by the Music Technology Group
in Barcelona, by P. Ramoneda et al.
in Barcelona, by P. Ramoneda et al.

The CIPI dataset facilitates various applications such as the study of musical complexity, the selection of
appropriately leveled pieces for students, and general research in music education. The dataset, alongside
embeddings of multiple dimensions of difficulty, has been made publicly available to encourage ongoing innovation
and collaboration within the music education and research communities.

The dataset has been published alongside a paper in Expert Systems with Applications Journal.
The dataset has been published alongside a paper in Expert Systems with Applications Journal.

The dataset is shared under a Creative Commons Attribution Non Commercial Share Alike 4.0 International License, but
need to be requested. Please do request the dataset here: https://zenodo.org/records/8037327. The dataset can only
Expand Down
6 changes: 3 additions & 3 deletions mirdata/datasets/compmusic_carnatic_rhythm.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

The dataset contains the following data:

**AUDIO:** The pieces are chosen from the CompMusic Carnatic music collection. The pieces were chosen in four popular taalas of
**AUDIO:** The pieces are chosen from the CompMusic Carnatic music collection. The pieces were chosen in four popular taalas of
Carnatic music, which encompasses a majority of Carnatic music. The pieces were chosen include a mix of vocal and instrumental recordings,
new and old recordings, and to span a wide variety of forms. All pieces have a percussion accompaniment, predominantly Mridangam. The
excerpts are full length pieces or a part of the full length pieces. There are also several different pieces by the same artist (or release
Expand All @@ -27,9 +27,9 @@
**METADATA:** For each excerpt, the taala of the piece, edupu (offset of the start of the piece, relative to the sama, measured in aksharas)
of the composition, and the kalai (the cycle length scaling factor) are recorded. Each excerpt can be uniquely identified and located with the
MBID of the recording, and the relative start and end times of the excerpt within the whole recording. A separate 5 digit taala based unique ID
is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raaga of the piece are additional
is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raaga of the piece are additional
editorial metadata obtained from the release. A flag indicates if the excerpt is a full piece or only a part of a full piece. There are optional
comments on audio quality and annotation specifics.
comments on audio quality and annotation specifics.

Possible uses of the dataset: Possible tasks where the dataset can be used include taala, sama and beat tracking, tempo estimation and tracking,
taala recognition, rhythm based segmentation of musical audio, structural segmentation, audio to score/lyrics alignment, and rhythmic pattern
Expand Down
2 changes: 1 addition & 1 deletion mirdata/datasets/compmusic_carnatic_varnam.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
**Sections**
The notation is given a single time per section, however, to align the svaras with the tala annotations, structure
information is given. The structure is given in yaml format, specifying the order of the sections, and how many svaras
are sung per each tala tick. Broadly, there are just two only cases, 2 svaras per tick, and 4 svaras per tick.
are sung per each tala tick. Broadly, there are just two only cases, 2 svaras per tick, and 4 svaras per tick.
The structure information has been added in the 1.1 version of the dataset.

**Possible uses of the dataset**
Expand Down
8 changes: 4 additions & 4 deletions mirdata/datasets/compmusic_hindustani_rhythm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
.. admonition:: Dataset Info
:class: dropdown

CompMusic Hindustani Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Hindustani Music.
The collection consists of audio excerpts from the CompMusic Hindustani research corpus, manually annotated time aligned markers
CompMusic Hindustani Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Hindustani Music.
The collection consists of audio excerpts from the CompMusic Hindustani research corpus, manually annotated time aligned markers
indicating the progression through the taal cycle, and the associated taal related metadata. A brief description of the dataset
is provided below.

Expand All @@ -19,9 +19,9 @@
The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz. The audio is also available as wav files for experiments.

**SAM, VIBHAAG AND THE MAATRAS:** The primary annotations are audio synchronized time-stamps indicating the different metrical positions in the taal cycle.
The sam and matras of the cycle are annotated. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps.
The sam and matras of the cycle are annotated. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps.
Each annotation has a time-stamp and an associated numeric label that indicates the position of the beat marker in the taala cycle. The annotations and the
associated metadata have been verified for correctness and completeness by a professional Hindustani musician and musicologist. The long thick lines show
associated metadata have been verified for correctness and completeness by a professional Hindustani musician and musicologist. The long thick lines show
vibhaag boundaries. The numerals indicate the matra number in cycle. In each case, the sam (the start of the cycle, analogous to the downbeat) are indicated
using the numeral 1.

Expand Down
22 changes: 11 additions & 11 deletions mirdata/datasets/compmusic_indian_tonic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@

This loader includes a combination of six different datasets for the task of Indian Art Music tonic identification.

These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt.
Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational
approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below.
A majority of These datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID.
These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt.
Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational
approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below.
A majority of These datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID.
Through the MBID other information can be obtained using the Dunya API.


These six datasets are used for for the task of tonic identification for Indian Art Music, and can be used for a comparative evaluation.
To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary
in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and
instrumental and vocal excerpts.
These six datasets are used for for the task of tonic identification for Indian Art Music, and can be used for a comparative evaluation.
To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary
in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and
instrumental and vocal excerpts.

All the datasets (annotations) are version controlled. The audio files corresponding to these datsets are made available on request
for only research purposes. See DOWNLOAD_INFO of this loader.
Expand All @@ -25,7 +25,7 @@
.. code-block::

'ID': {
'artist': <name of the lead artist if available>,
'artist': <name of the lead artist if available>,
'filepath': <relative path to the audio file>,
'gender': <gender of the lead singer if available>,
'mbid': <musicbrainz id when available>,
Expand All @@ -41,10 +41,10 @@
these features may be easily computed following the instructions in the related paper. See BIBTEX.

There are a total of 2161 audio excerpts, and while the CM collection includes aproximately 50% Carnatic and 50% Hindustani recordings, IITM and
IISc collections are 100% Carnatic music. The excerpts vary a lot in duration. See [this webpage](https://compmusic.upf.edu/iam-tonic-dataset)
IISc collections are 100% Carnatic music. The excerpts vary a lot in duration. See [this webpage](https://compmusic.upf.edu/iam-tonic-dataset)
for a detailed overview of the datasets.

If you have any questions or comments about the dataset, please feel free to email: [sankalp (dot) gulati (at) gmail (dot) com], or
If you have any questions or comments about the dataset, please feel free to email: [sankalp (dot) gulati (at) gmail (dot) com], or
[sankalp (dot) gulati (at) upf (dot) edu].

"""
Expand Down
30 changes: 15 additions & 15 deletions mirdata/datasets/compmusic_raga.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,26 @@
.. admonition:: Dataset Info
:class: dropdown

Rāga datasets from CompMusicomprise two sizable datasets, one for each music tradition,
Carnatic and Hindustani. These datasets comprise full length audio recordings and their
associated rāga labels. These two datasets can be used to develop and evaluate approaches
Rāga datasets from CompMusicomprise two sizable datasets, one for each music tradition,
Carnatic and Hindustani. These datasets comprise full length audio recordings and their
associated rāga labels. These two datasets can be used to develop and evaluate approaches
for performing automatic rāga recognition in Indian art music.

These datasets are derived from the CompMusic corpora of Indian Art Music. Therefore, the
dataset has been compiled at the Music Technology Group, by a group of researchers working
on the computational analysis of Carnatic and Hindustani music within the framework of the
ERC-funded CompMusic project.

Each recording is associated with a MBID. With the MBID other information can be obtained
using the Dunya API or pycompmusic.
ERC-funded CompMusic project.

The Carnatic subset comprises 124 hours of audio recordings and editorial metadata that
includes carefully curated and verified rāga labels. It contains 480 recordings belonging
Each recording is associated with a MBID. With the MBID other information can be obtained
using the Dunya API or pycompmusic.

The Carnatic subset comprises 124 hours of audio recordings and editorial metadata that
includes carefully curated and verified rāga labels. It contains 480 recordings belonging
to 40 rāgas with 12 recordings per rāga.

The Hindustani subset comprises 116 hours of audio recordings and editorial metadata that
includes carefully curated and verified rāga labels. It contains 300 recordings belonging
to 30 rāgas with 10 recordings per rāga.
The Hindustani subset comprises 116 hours of audio recordings and editorial metadata that
includes carefully curated and verified rāga labels. It contains 300 recordings belonging
to 30 rāgas with 10 recordings per rāga.

The dataset also includes features per each file:
* Tonic: float indicating the recording tonic
Expand All @@ -32,9 +32,9 @@
* Nyas segments: KNN-extracted segments of Nyas (start and end times provided)
* Tani segments: KNN-extracted segments of Tanis (start and end times provided)

The dataset includes both txt files and json files that contain information about each audio
recording in terms of its mbid, the path of the audio/feature files and the associated rāga
identifier. Each rāga is assigned a unique identifier by Dunya, which is similar to the mbid
The dataset includes both txt files and json files that contain information about each audio
recording in terms of its mbid, the path of the audio/feature files and the associated rāga
identifier. Each rāga is assigned a unique identifier by Dunya, which is similar to the mbid
in terms of purpose. A mapping of the rāga id to its transliterated name is also provided.

For more information about the dataset please refer to: https://compmusic.upf.edu/node/328
Expand Down
Loading
Loading