Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79

jarredou · 2024-10-28T22:54:22Z

Hi,

I've found a simple way to objectively measure bleed and fullness in context of music source separation that I think could be useful as I haven't seen any existing objective metric doing this, while it's a common question from users.

Here is code as a metric:

def bleed_full(ref, est, sr=44100):
    # STFT parameters
    n_fft = 4096
    hop_length = 1024
    n_mels = 512

    # Compute Mag STFTs
    D1 = np.abs(librosa.stft(ref, n_fft=n_fft, hop_length=hop_length))
    D2 = np.abs(librosa.stft(est, n_fft=n_fft, hop_length=hop_length))

    # Convert to mel spectrograms
    mel_basis = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels)
    S1_mel = np.dot(mel_basis, D1)
    S2_mel = np.dot(mel_basis, D2)

    # Convert to decibels
    S1_db = librosa.amplitude_to_db(S1_mel)
    S2_db = librosa.amplitude_to_db(S2_mel)
    
    # Calculate difference
    diff = S2_db - S1_db

    # Separate positive and negative differences
    positive_diff = diff[diff > 0]
    negative_diff = diff[diff < 0]

    # Calculate averages
    average_positive = np.mean(positive_diff) if len(positive_diff) > 0 else 0
    average_negative = np.mean(negative_diff) if len(negative_diff) > 0 else 0
    
    # Scale with 100 as best score
    bleedness = 100  / (average_positive + 1)
    fullness = 100 / (-average_negative + 1)

    return bleedness, fullness

I guess it can be adapted as losses, but I'm not dev/scientist and I'm lacking knowledge to make it bulletproof, if it worth it, you should know better than me.

Same concept can be used to draw spectrograms with, for example: bleed/positive values (red), missing content/negative values (blue), perfect separation = 0 (white):

The text was updated successfully, but these errors were encountered:

turian · 2024-11-09T17:29:31Z

@jarredou I'm curious about this. So basically:

Instead of doing l1 mel spectral distance, you separate it into two components:

Bleed = anything ADDED to the target spectrogram
-Fullness = anything REMOVED from the target spectrogram

I see you do MSS work. I noted in the BS-Roformer paper that the authors wrote: "our model outputs gained more preference from musicians and educators than from music producers in the listening test of SDX23". To my ears, bs-roformers seem to have have less bleed but less fullness. I'd be curious if you have any numbers to share. (cc @ZFTurbo )

jarredou · 2024-11-18T23:14:30Z

@turian Yeah, that's the simple idea behind the 2 metrics.

About the BS-Rofomer quote, it's from this final paper from SDX/MDX23 contest https://arxiv.org/pdf/2308.06979

We don't have numbers between different neural network models. For now, the metrics was only used to evaluate different fine-tuned versions made on top of Kimberley's Melband-Rofomer model the results are accessible here https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit and it was made using mvsep.com multisong eval dataset.

ZFTurbo has added the torch version of the metric to his training script a few days ago.

jarredou · 2024-11-24T00:01:00Z

Little update: The metric was used as loss, to emphasized fullness on a vocals model and it does great job in the said task, especially on extracting the reverb more fully (also more clarity in the vocals consonants in high frequency range) :
1st pic is Kim's original model, 2nd one is the finetuned version emphasizing on vocals fullness (at a cost of a bit more noisy separation too):

(all these experiments are done inside the Audio Separation discord community (invite: https://discord.gg/ndC4UmPZwZ)

jarredou changed the title ~~New metric for source separation, measuring separately bleed and fullness in separated audio~~ Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79

Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79

jarredou commented Oct 28, 2024 •

edited

Loading

turian commented Nov 9, 2024

jarredou commented Nov 18, 2024 •

edited

Loading

jarredou commented Nov 24, 2024 •

edited

Loading

Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79

Enhancement ? New metric for source separation, measuring separately bleed and fullness in separated audio #79

Comments

jarredou commented Oct 28, 2024 • edited Loading

turian commented Nov 9, 2024

jarredou commented Nov 18, 2024 • edited Loading

jarredou commented Nov 24, 2024 • edited Loading

jarredou commented Oct 28, 2024 •

edited

Loading

jarredou commented Nov 18, 2024 •

edited

Loading

jarredou commented Nov 24, 2024 •

edited

Loading