-
Notifications
You must be signed in to change notification settings - Fork 96
Add precision #1326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add precision #1326
Conversation
/** | ||
* Estimate the precision of the model | ||
*/ | ||
fun estimatePrecision400(): Double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we provide both measures, we need to expand the comments to clarify what is the difference between accuracy and precision here
* If a new input is created based on a rejected prediction, when we would not know if that was correct or not. | ||
*/ | ||
fun updatePerformance(predictionWasCorrect: Boolean) | ||
fun updatePerformance(predictedStatusCode: Int, actualStatusCode: Int? = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume here the semantic is predictionWasCorrect = (predictedStatusCode == actualStatusCode)
? if so, how can actualStatusCode
be nullable?
val tp = truePositive400Queue.count { it } // count only true entries | ||
val fp = falsePositive400Queue.count { it } // count only true entries | ||
return if ((tp + fp) > 0) tp.toDouble() / (tp + fp) else 0.0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need some basic unit tests to validate these computations (eg in ModelAccuracyWithTimeWindowTest.kt
)
truePositive400Queue.add(predictionWasCorrect) | ||
} else { | ||
falsePositive400Queue.add(!predictionWasCorrect) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? at each call of updatePerformance
, should both data structures truePositive400Queue
and falsePositive400Queue
be updated? regardless, add unit tests to validate the precision calculations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I'll fix it.
|
||
ma.updatePerformance(true) | ||
ma.updatePerformance(predictedStatusCode = 200, actualStatusCode = 200) //true | ||
assertEquals(1.0, ma.estimateAccuracy(), delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess here you could add assertions on the estimated precision
@MohsenTaheriShalmani is this ready for me to review? or are you still making changes? |
override fun estimateMetrics(endpoint: Endpoint): ModelEvaluation { | ||
val m = models[endpoint] ?: return ModelEvaluation( | ||
accuracy = 0.5, | ||
precision400 = 0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 0.5
as default values and not 0.0
?
override fun estimateOverallAccuracy(): Double { | ||
override fun estimateOverallMetrics(): ModelEvaluation { | ||
if (models.isEmpty()) { | ||
return ModelEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code. should have a default immutable instance for example declared and returned directly from ModelEvaluaton
, eg something like ModelEvaluation.DEFAULT_NO_DATA
val n = models.size.toDouble() | ||
val sum = models.values.sumOf { it.estimateOverallAccuracy() } | ||
val total = models.values.map { | ||
it?.estimateOverallMetrics() ?: ModelEvaluation(0.5, 0.5, 0.0, 0.0, 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code
precision400 = 0.5, | ||
recall400 = 0.0, | ||
f1Score400 = 0.0, | ||
mcc = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code
val acc = modelMetrics.estimateAccuracy() | ||
val prec = modelMetrics.estimatePrecision400() | ||
val rec = modelMetrics.estimateRecall400() | ||
val f1 = if (prec + rec == 0.0) 0.0 else 2 * (prec * rec) / (prec + rec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is f1
derived from prec
and rec
? if so, why having it as input, and not computed directly inside ModelMetrics
?
override fun estimateF1Score400(): Double { | ||
val p = estimatePrecision400() | ||
val r = estimateRecall400() | ||
return if (p + r > 0.0) 2 * (p * r) / (p + r) else 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated equation
* Unified metrics estimate packaged into a [ModelEvaluation]. | ||
*/ | ||
override fun estimateMetrics(): ModelEvaluation { | ||
return ModelEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be the default implementation directly in the interface ModelEvaluation
.
actualStatusCode == 400 && predictedStatusCode != 400 -> falseNegative400Queue.add(true) | ||
actualStatusCode != 400 && predictedStatusCode == 400 -> falsePositive400Queue.add(true) | ||
actualStatusCode != 400 && predictedStatusCode != 400 -> trueNegative400Queue.add(true) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i m not sure these computations are correct, if we want to have a timewindow of size N.
regardless of the number of queues, only half of them get new data at each call of this function. and only with value true
.
i think all queues should be updated (eg, consider the false
cases as well)
override fun estimateAccuracy(endpoint: Endpoint): Double = 0.0 | ||
override fun estimateOverallAccuracy(): Double = 0.0 | ||
override fun estimateMetrics(endpoint: Endpoint): ModelEvaluation = ModelEvaluation(accuracy = 0.5,precision400 = 0.5,recall400 = 0.0,f1Score400 = 0.0,mcc = 0.0) | ||
override fun estimateOverallMetrics(): ModelEvaluation = ModelEvaluation(accuracy = 0.5,precision400 = 0.5,recall400 = 0.0,f1Score400 = 0.0,mcc = 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code. this default instance should really be instantiated only once and referred in all thes different places. so that, if we want to change the default of one parameter, we do it only once, and not 20 times :)
fun estimateModelMetrics() { | ||
|
||
val delta = 0.0001 | ||
val ma = ModelMetricsWithTimeWindow(10) // buffer large enough to hold all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides this test, add back previous deleted test, where val ma = ModelAccuracyWithTimeWindow(4)
.
that was done on purpose to verify the time window.
as rule of thumb, do not delete or modify existing tests, add new ones (of course if needed, changes/deletes can be discussed).
as wrote in a previous comment, I do not think the timewindow implementation is correct. to verify it, need a buffer that is smaller than the data you insert
} | ||
|
||
return estimateOverallAccuracy() | ||
return ModelEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand this change. why removing the call to estimateOverallAccuracy()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for an endpoint model, “overall” is the same as the endpoint’s metrics, so I just return its own metrics directly.
modelMetrics.estimatePrecision400(), | ||
modelMetrics.estimateRecall400(), | ||
modelMetrics.estimateMCC400() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as in other case. a model for a specific endpoint will have no difference between estimateMetrics(endpoint: Endpoint)
and estimateOverallMetrics()
apart from the endpoint validation in the former. so one function should call the other to avoid code duplication. also, what is the difference between:
return ModelEvaluation(
modelMetrics.estimateAccuracy(),
modelMetrics.estimatePrecision400(),
modelMetrics.estimateRecall400(),
modelMetrics.estimateMCC400()
)
and
return modelMetrics.estimateMetrics()
?
|
||
/** Unified metrics estimate packaged into a [ModelEvaluation] */ | ||
override fun estimateMetrics(): ModelEvaluation { | ||
return ModelEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be overridden if the implementation is the same?
override fun estimateMetrics(): ModelEvaluation { | ||
return ModelEvaluation( | ||
accuracy = estimateAccuracy(), | ||
precision400 = estimatePrecision400(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment. if the implementation is the same, then no need to override
val ma = ModelMetricsWithTimeWindow(10) // buffer large enough to hold all | ||
|
||
// 1) correct prediction: 200 vs. 200 -> TN | ||
ma.updatePerformance(predictedStatusCode = 200, actualStatusCode = 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this test. but add a new one (can copy&paste some parts of this test)in which the amount of data is more than the buffer. eg you could have a buffer of size 2, and insert the first 5 elements in this test
Since the main objective is to detect HTTP 400 responses, precision is a more meaningful metric than accuracy. Accuracy can be misleading if 400s are rare, while precision reflects how reliable the model’s 400 predictions are.
Additionally, it may be more sensible to use precision in model verification rather than expecting perfect performance.