This blogpost is a growing collection of different metrics, translated from the math notation to code. I try to keep it as concise and efficient as possible.
Don't hesitate to contact me for any mistakes found or suggestions.
Inspiration and reference: https://github.com/Jam3/math-as-code
import numpy as np
import matplotlib.pyplot as plt
import math
%matplotlib inline
x = np.linspace(-100,100,201)
y = x + 2
y_scatter = x + np.random.normal(0, 15, 201)
len(x), len(y), len(y_scatter)
(201, 201, 201)
plt.scatter(x,y)
<matplotlib.collections.PathCollection at 0x10c6bb358>
The metrics which are chosen to evaluate ones machine learning algorithm vary wildly, depending on the usecase (or the Kaggle competition). A few of them are shown below.
RMSE
Arguably on of the most frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSE, also called RMSD, is calculated by taking the square root of the mean of the squared delta between y_pred
and y_true
.
$${\displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum {t=1}^{T}({\hat {y}}{t}-y_{t})^{2}}{T}}}}$$
where
def rmse(y_pred, y): return math.sqrt(((y_pred-y)**2).mean())
SMAPE
Symmetric mean absolute percentage error (SMAPE or sMAPE) is an accuracy measure based on percentage (or relative) errors. It is usually defined as follows:
$${\displaystyle {\text{SMAPE}}={\frac {100%}{n}}\sum {t=1}^{n}{\frac {\left|F{t}-A_{t}\right|}{(|A_{t}|+|F_{t}|)/2}}}$$
where
def SMAPE(y_pred, y): return ((200*np.abs(y_pred-y)/(y+y_pred)).mean())
Confusion Matrix
Actual class | ||||
---|---|---|---|---|
Cat | Dog | Rabbit | ||
Predicted
class |
Cat | 5 | 2 | 0 |
Dog | 3 | 3 | 2 | |
Rabbit | 0 | 1 | 11 |
Accuracy
The accuracy of a test is its ability to differentiate the cases correctly. To estimate the accuracy of a test, we should calculate the proportion of true positive and true negative in all evaluated cases.
def acc(y_pred, y): return np.mean(y_pred == y)
Precision
Precision is the fraction of relevant instances among the retrieved instances.
Recall Recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.
F1-Measure The F1-Measure is a measure that combines precision and recall by taking the harmonic mean of precision and recall:
In mathematics, statistics, and computer science, particularly in the fields of machine learning and inverse problems, regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting (source).
L1 Regularization or Lasso Regression
$$\sum {i=1}^{n}(y{i}-\sum {j=1}^{p}x{ij}\beta_{j})^{2}+\lambda\sum {j=1}^{p}|\beta{j}|$$
where $\lambda\sum {j=1}^{p}|\beta{j}|$ is the L1 regularization term. Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds “absolute value of magnitude” of coefficient as penalty term to the loss function. If
L2 Regularization or Ridge Regression
$$\sum {i=1}^{n}(y{i}-\sum {j=1}^{p}x{ij}\beta_{j})^{2}+\lambda\sum {j=1}^{p}\beta{j}^{2}$$
where $\lambda\sum {j=1}^{p}\beta{j}^{2}$ is the L2 regularization term. Ridge Regression adds the “squared magnitude” of coefficient as penalty term to the loss function. Here, if
Having said that it’s important how lambda is chosen. This technique works very well to avoid over-fitting issue.
Binary Crossentropy
def binary_loss(y, p): return np.mean(-(y * np.log(p) + (1-y)*np.log(1-p)))
Softmax
Sigmoid
def sigmoid(x): return 1 / (1 + math.exp(-x))
plt.plot(x, [sigmoid(num) for num in y])
[<matplotlib.lines.Line2D at 0x10c897fd0>]
tanh
def tanh(x): return np.tanh(x)
plt.plot(x, tanh(y))
[<matplotlib.lines.Line2D at 0x10ccc5550>]
ReLU
def relu(x): return max(0,x)
plt.plot(x, [relu(num) for num in y])
[<matplotlib.lines.Line2D at 0x10cdbebe0>]
Standard deviation
$$\sigma ={\sqrt {{\frac {1}{N}}\sum {i=1}^{N}(x{i}-\mu )^{2}}},{\rm {\ \ where\ \ }}\mu ={\frac {1}{N}}\sum {i=1}^{N}x{i}$$
Sample standard deviation
$${\displaystyle s={\sqrt {\frac {1}{n-1}{\sum {i=1}^{N}(x{i}-{\overline {x}})^{2}}}}}$$
Probability concepts explained
- https://towardsdatascience.com/probability-concepts-explained-introduction-a7c0316de465
- https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1
Maximum likelihood estimation is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed.
We want to know which curve was most likely responsible for creating the data points that we observed? Maximum likelihood estimation is a method that will find the values of μ and σ that result in the curve that best fits the data.
Likelihood and the probability density are fundamentally asking different questions — one is asking about the data and the other is asking about the parameter values.