precisionAndRecall

When working with a machine learning application where the ratio of positive to negative examples is highly skewed, traditional error metrics like accuracy can be misleading. Here's a consolidated explanation:

Example Scenario
Imagine you're training a binary classifier to detect a rare disease. The disease is present (y = 1) in only 0.5% of the population, and absent (y = 0) in the remaining 99.5%. If your classifier achieves 1% error, it might seem impressive. However, a naive algorithm that always predicts y = 0 (no disease) would achieve 99.5% accuracy, outperforming your classifier's 99% accuracy.

Problem with Accuracy
In cases of skewed data, accuracy doesn't effectively measure performance. A classifier that always predicts the majority class (y = 0) can achieve high accuracy without being useful.
Confusion Matrix
A confusion matrix helps visualize the performance:

True Positives (TP): Correctly predicted disease cases.
True Negatives (TN): Correctly predicted non-disease cases.
False Positives (FP): Incorrectly predicted disease cases.
False Negatives (FN): Missed disease cases.
Example Calculation
Suppose in a test set of 100 examples:

TP = 15
FP = 5
FN = 10
TN = 70
Precision and recall would be:

Precision
=
15
15
+
5
=
0.75
Precision=
15+5
15
​
 =0.75 (75%)
Recall
=
15
15
+
10
=
0.60
Recall=
15+10
15
​
 =0.60 (60%)
Conclusion
Precision and recall provide a clearer picture of a classifier's performance, especially with skewed data. They help ensure the classifier is both accurate and useful in identifying rare events.