-
-
Notifications
You must be signed in to change notification settings - Fork 421
PICARD-2655: Improve matching for VA releases #1918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,6 +46,7 @@ | |
from PyQt5.QtCore import QObject | ||
|
||
from picard.config import get_config | ||
from picard.const import VARIOUS_ARTISTS_ID | ||
from picard.mbjson import ( | ||
artist_credit_from_node, | ||
get_score, | ||
|
@@ -253,8 +254,22 @@ def compare_to_release_parts(self, release, weights): | |
|
||
if "albumartist" in self and "albumartist" in weights: | ||
a = self["albumartist"] | ||
b = artist_credit_from_node(release['artist-credit'])[0] | ||
parts.append((similarity2(a, b), weights["albumartist"])) | ||
release_artists = release['artist-credit'] | ||
b = artist_credit_from_node(release_artists)[0] | ||
artist_weight = weights["albumartist"] | ||
if a.lower() in {'various', 'various artists'}: | ||
# if artist in tag is 'various' or 'various artists', | ||
# it is very likely we look for a VA compilation | ||
# so increase the artist's weight | ||
artist_weight *= 2 | ||
if release_artists[0]['artist']['id'] == VARIOUS_ARTISTS_ID: | ||
# it is fairly common VA release are tagged with label's name | ||
# so if a release is credited to VA, assume it is similar | ||
# to artist in tag | ||
sim = 1.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am entirely unsure that this logic is at all correct for various reasons (though it may equally be that I don't understand how this is supposed to work) but:
Do we need some unit tests with various albums and various potential releases to test that we get the best release selected? Edited: to add note about accented characters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd love have tests for that... thanks for volunteering ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am happy to volunteer to write tests for this providing that you are prepared to wait until I have the time to spend on open source and for this to get to the top of my queue of open source stuff I have planned. 😉 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Sophist-UK that we likely should combine the tests of tags indicating VA and the release being VA. The goal of the patch is to improve the matching of files from a VA release to VA. And one of the problems here is that pure matching of album artist tag with the MB artist name ("Various Artists") often does not yield good results. The code will indeed improve the situation. But at the same time it will make matching a VA release from MB to any file more likely, as all VA releases will always have full albumartist similarity.That might increase the problem that files get tagged too much against compilation, something many users already struggle with. Combining the check of existing album artist tags indicating VA and the release being VA would avoid this. For the localization I would suggest to check for This will not solve the case where the album artist is set to the label name or something completely different, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A. I am really not sure that we should use B. There are probably four use cases to consider, two where the album artist is set to the label name:
Of course the principle cause of all three use cases is bad data - so the primary fix should be to have an MB report to identify use cases 1, 3 and 4 and have editors fix the data. (Or indeed, have some code for case 3 to auto-create (voteable) edits to fix the album artist MBID.) However, as per this PR Picard should still try to handle bad data because it will never be perfect - and the code should probably deal with cases 1 and 3 and perhaps also 4 (since it is only tweaking the fuzzy match and not making hard changes to the data). C. That said, we should also consider whether we want Picard to change the data if this release is chosen. If we genuinely believe that use cases 1, 2 and 4 are detectable and we are tweaking the fuzzy search, should we also tweak the MB data to "improve" it? (This is much more contentious, and I put this up for discussion because I have not yet come to an ethical/practical personal conclusion.) If we do this, we would have to do it elsewhere in the code because once the Album has an Album MBID, fuzzy search code is not executed. D. I think that fuzzy match fudges like this to attempt to correct bad MBID data should have an |
||
else: | ||
sim = similarity2(a, b) | ||
parts.append((sim, artist_weight)) | ||
|
||
if "totaltracks" in weights: | ||
try: | ||
|
Uh oh!
There was an error while loading. Please reload this page.