Add MCC #97

Expertium · 2024-06-24T08:03:35Z

I'll copy what I said in Discord

Another request: calculate Matthew's correlation coefficient: https://en.wikipedia.org/wiki/Phi_coefficient

Select 0.5 as the threshold, and convert all predicted probabilities to binary values

# do this, but using a for loop or list comprehensions or something 
if y_pred > 0.5:
    y_pred = 1
else:
    y_pred = 0

Calculate the confusion matrix, and obtain the four values from it

from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0, 1]).ravel()

Calculate MCC:

import math

def mcc(tn, fp, fn, tp):
    sums = [tp + fp, tp + fn, tn + fp, tn + fn]
    n_zero = 0
    for i in range(4):
        if sums[i] == 0:
            n_zero += 1

    if n_zero == 0:
        x = sums[0] * sums[1] * sums[2] * sums[3]  # I tried using np.prod, but it outputs negative values sometimes, probably due to an overflow
        # I also have to use math.sqrt, because np.sqrt doesn't work for very large numbers
        return ((tp * tn) - (fp * fn)) / math.sqrt(x)
    # if one of the four sums in the denominator is zero, return 0
    elif n_zero == 1:
        return 0
    # if two of the four sums are zero, return 1 or -1, depending on TP, TN, FP and FN
    # https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7
    elif n_zero == 2:
        if tp != 0 or tn != 0:
            return 1
        elif fp != 0 or fn != 0:
            return -1
    # if more than two sums are zero, return None
    elif n_zero > 2:
        return None

I linked 2 articles about MCC (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9938573 and https://ieeexplore.ieee.org/abstract/document/9440903). It has an advantage over AUC - it takes into account all four numbers (true positives, true negatives, false positives, false negatives), whereas AUC only takes into account two.

So with AUC and MCC added, we will have 2 calibration metrics and 2 classification metrics, which is more than enough

The text was updated successfully, but these errors were encountered:

L-M-Sherlock closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MCC #97

Add MCC #97

Expertium commented Jun 24, 2024 •

edited

Loading

Add MCC #97

Add MCC #97

Comments

Expertium commented Jun 24, 2024 • edited Loading

Expertium commented Jun 24, 2024 •

edited

Loading