-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathprecisionAndRecall
executable file
·51 lines (46 loc) · 1.57 KB
/
precisionAndRecall
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
When working with a machine learning application where the ratio of positive to negative examples is highly skewed, traditional error metrics like accuracy can be misleading. Here's a consolidated explanation:
Example Scenario
Imagine you're training a binary classifier to detect a rare disease. The disease is present (y = 1) in only 0.5% of the population, and absent (y = 0) in the remaining 99.5%. If your classifier achieves 1% error, it might seem impressive. However, a naive algorithm that always predicts y = 0 (no disease) would achieve 99.5% accuracy, outperforming your classifier's 99% accuracy.
Problem with Accuracy
In cases of skewed data, accuracy doesn't effectively measure performance. A classifier that always predicts the majority class (y = 0) can achieve high accuracy without being useful.
Confusion Matrix
A confusion matrix helps visualize the performance:
True Positives (TP): Correctly predicted disease cases.
True Negatives (TN): Correctly predicted non-disease cases.
False Positives (FP): Incorrectly predicted disease cases.
False Negatives (FN): Missed disease cases.
Example Calculation
Suppose in a test set of 100 examples:
TP = 15
FP = 5
FN = 10
TN = 70
Precision and recall would be:
Precision
=
15
15
+
5
=
0.75
Precision=
15+5
15
=0.75 (75%)
Recall
=
15
15
+
10
=
0.60
Recall=
15+10
15
=0.60 (60%)
Conclusion
Precision and recall provide a clearer picture of a classifier's performance, especially with skewed data. They help ensure the classifier is both accurate and useful in identifying rare events.