-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not every boundary has a lowercase annotation #13
Comments
Yep, that's a data error! There should be a lowercase label on every line, basically. I'm not sure what the other 451 instances look like, but in this case I would assume a unique lowercase label for that segment. Alternatively, the "G" might be a mistyped "g". One would have to listen to the track to decide. If you have the list of 451 files (and can automatically point to the lines representing the discrepancies), I'd be happy to take a look at more! |
Here's a notebook to detect misaligned upper-lower segments within a tolerance threshold, along with the list of offending inputs and annotations. These were computed on the latest pull from this repo. |
Thanks! Very nifty. Looking at this, I realize something I forgot in my own
So it's to be treated a bit like a "silence" or "end" tag, in that it is On 28 June 2016 at 04:02, Brian McFee [email protected] wrote:
|
Ok.. does it make sense to insert a Otherwise, what do you want to do about the "true errors"? |
Pinging back on this: is it kosher to just propagate uppers->lowers for Z segments and treat the remaining missing boundaries as errors to be corrected? |
Yes, I think that makes sense! On 29 July 2016 at 03:05, Brian McFee [email protected] wrote:
|
Great. I can do this in a PR and send it back upstream to you, if you like. I'd like to avoid redundant work though -- it seems like there's already a list of errors in place in the (Note: i'm working toward an 8/26 deadline for this, so the sooner i can get this set, the better. Not to impose my own constraints on you or anything 😁 ) |
As per label rules, point (a):
These points seem to imply that every uppercase boundary should coincide with a lowercase boundary. This doesn't seem to be true in the data though, eg, 100/textfile2.txt. I ran a script to find this kind of anomaly, and tallied about 451 files with significant (>3s) disagreements between upper and closest lower-case boundaries.
How should these be interpreted?
The text was updated successfully, but these errors were encountered: