Not every boundary has a lowercase annotation #13

bmcfee · 2016-06-21T18:25:36Z

Letter labels: The entire piece must be fully labelled with both uppercase
and lowercase letter labels. That is, no time-span should be unlabelled at
any hierarchical level. The only exceptions are the special labels described below. 

a) Lowercase labels: Every boundary must be provided with a lowercase letter label.
b) Uppercase labels: While the entire piece must be fully labelled with uppercase letter labels, not each section needs to be labelled individually: an uppercase label is assumed to persist until the next uppercase label.

These points seem to imply that every uppercase boundary should coincide with a lowercase boundary. This doesn't seem to be true in the data though, eg, 100/textfile2.txt. I ran a script to find this kind of anomaly, and tallied about 451 files with significant (>3s) disagreements between upper and closest lower-case boundaries.

How should these be interpreted?

The text was updated successfully, but these errors were encountered:

jblsmith · 2016-06-26T05:57:24Z

Yep, that's a data error! There should be a lowercase label on every line, basically. I'm not sure what the other 451 instances look like, but in this case I would assume a unique lowercase label for that segment. Alternatively, the "G" might be a mistyped "g". One would have to listen to the track to decide.

If you have the list of 451 files (and can automatically point to the lines representing the discrepancies), I'd be happy to take a look at more!

bmcfee · 2016-06-27T19:02:25Z

Here's a notebook to detect misaligned upper-lower segments within a tolerance threshold, along with the list of offending inputs and annotations. These were computed on the latest pull from this repo.

jblsmith · 2016-06-28T07:26:31Z

Thanks! Very nifty. Looking at this, I realize something I forgot in my own
parser: among uppercase labels, "Z" was actually a kind of reserved label
that meant "non-music":

most often, pre-music applause in a rough live recording—hence its
frequency in the Live Music Archive, SALAMI IDs ~1000–1500;
or, sometimes, a post-song dialogue, as in a soundtrack track.

So it's to be treated a bit like a "silence" or "end" tag, in that it is
relevant to all parsed annotations (upper/lower/func/instr). At a glance,
that might resolve 90% of the files; the true errors seem concentrated in a
smaller bunch.

On 28 June 2016 at 04:02, Brian McFee [email protected] wrote:

Here's a notebook
https://gist.github.com/a4bde985242bb910d9db41b8bf550b72 to detect
misaligned upper-lower segments within a tolerance threshold, along with
the list of offending inputs and annotations. These were computed on the
latest pull from this repo.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AFJn10ytsRDrLRik3fDjIlC4kbyEWAI_ks5qQB5BgaJpZM4I7CH2
.

bmcfee · 2016-06-28T13:23:57Z

Ok.. does it make sense to insert a z boundary for every Z boundary then? Otherwise, a significant amount of information is lost when only looking at the lowercase annotations.

Otherwise, what do you want to do about the "true errors"?

bmcfee · 2016-07-28T18:05:05Z

Pinging back on this: is it kosher to just propagate uppers->lowers for Z segments and treat the remaining missing boundaries as errors to be corrected?

jblsmith · 2016-07-29T05:34:53Z

Yes, I think that makes sense!

On 29 July 2016 at 03:05, Brian McFee [email protected] wrote:

Pinging back on this: is it kosher to just propagate uppers->lowers for Z
segments and treat the remaining missing boundaries as errors to be
corrected?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFJn1xtwLnAtGf2-2f2GRTa6CsPuRdw5ks5qaO9SgaJpZM4I7CH2
.

bmcfee · 2016-07-29T17:38:48Z

Yes, I think that makes sense!

Great. I can do this in a PR and send it back upstream to you, if you like. I'd like to avoid redundant work though -- it seems like there's already a list of errors in place in the new_parser branch. Will that be fixed/merged in the near future, or should i work independently on the upper-lower cleanup?

(Note: i'm working toward an 8/26 deadline for this, so the sooner i can get this set, the better. Not to impose my own constraints on you or anything 😁 )

bmcfee mentioned this issue Oct 6, 2016

cleaning up hierarchical consistency issues #15

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not every boundary has a lowercase annotation #13

Not every boundary has a lowercase annotation #13

bmcfee commented Jun 21, 2016 •

edited

Loading

jblsmith commented Jun 26, 2016

bmcfee commented Jun 27, 2016

jblsmith commented Jun 28, 2016

bmcfee commented Jun 28, 2016

bmcfee commented Jul 28, 2016

jblsmith commented Jul 29, 2016

bmcfee commented Jul 29, 2016

Not every boundary has a lowercase annotation #13

Not every boundary has a lowercase annotation #13

Comments

bmcfee commented Jun 21, 2016 • edited Loading

jblsmith commented Jun 26, 2016

bmcfee commented Jun 27, 2016

jblsmith commented Jun 28, 2016

bmcfee commented Jun 28, 2016

bmcfee commented Jul 28, 2016

jblsmith commented Jul 29, 2016

bmcfee commented Jul 29, 2016

bmcfee commented Jun 21, 2016 •

edited

Loading