-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79
Comments
@jarredou I'm curious about this. So basically: Instead of doing l1 mel spectral distance, you separate it into two components:
I see you do MSS work. I noted in the BS-Roformer paper that the authors wrote: "our model outputs gained more preference from musicians and educators than from music producers in the listening test of SDX23". To my ears, bs-roformers seem to have have less bleed but less fullness. I'd be curious if you have any numbers to share. (cc @ZFTurbo ) |
@turian Yeah, that's the simple idea behind the 2 metrics. About the BS-Rofomer quote, it's from this final paper from SDX/MDX23 contest https://arxiv.org/pdf/2308.06979 We don't have numbers between different neural network models. For now, the metrics was only used to evaluate different fine-tuned versions made on top of Kimberley's Melband-Rofomer model the results are accessible here https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit and it was made using mvsep.com multisong eval dataset. ZFTurbo has added the torch version of the metric to his training script a few days ago. |
Little update: The metric was used as loss, to emphasized fullness on a vocals model and it does great job in the said task, especially on extracting the reverb more fully (also more clarity in the vocals consonants in high frequency range) : (all these experiments are done inside the Audio Separation discord community (invite: https://discord.gg/ndC4UmPZwZ) |
Hi,
I've found a simple way to objectively measure bleed and fullness in context of music source separation that I think could be useful as I haven't seen any existing objective metric doing this, while it's a common question from users.
Here is code as a metric:
I guess it can be adapted as losses, but I'm not dev/scientist and I'm lacking knowledge to make it bulletproof, if it worth it, you should know better than me.
Same concept can be used to draw spectrograms with, for example: bleed/positive values (red), missing content/negative values (blue), perfect separation = 0 (white):
The text was updated successfully, but these errors were encountered: