Skip to content

Conversation

MohsenTaheriShalmani
Copy link
Contributor

Since the main objective is to detect HTTP 400 responses, precision is a more meaningful metric than accuracy. Accuracy can be misleading if 400s are rare, while precision reflects how reliable the model’s 400 predictions are.
Additionally, it may be more sensible to use precision in model verification rather than expecting perfect performance.

Base automatically changed from Packing-Probabilistic-AI-Models to master September 22, 2025 07:17
/**
* Estimate the precision of the model
*/
fun estimatePrecision400(): Double
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we provide both measures, we need to expand the comments to clarify what is the difference between accuracy and precision here

* If a new input is created based on a rejected prediction, when we would not know if that was correct or not.
*/
fun updatePerformance(predictionWasCorrect: Boolean)
fun updatePerformance(predictedStatusCode: Int, actualStatusCode: Int? = null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume here the semantic is predictionWasCorrect = (predictedStatusCode == actualStatusCode) ? if so, how can actualStatusCode be nullable?

val tp = truePositive400Queue.count { it } // count only true entries
val fp = falsePositive400Queue.count { it } // count only true entries
return if ((tp + fp) > 0) tp.toDouble() / (tp + fp) else 0.0
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some basic unit tests to validate these computations (eg in ModelAccuracyWithTimeWindowTest.kt)

truePositive400Queue.add(predictionWasCorrect)
} else {
falsePositive400Queue.add(!predictionWasCorrect)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? at each call of updatePerformance, should both data structures truePositive400Queue and falsePositive400Queue be updated? regardless, add unit tests to validate the precision calculations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'll fix it.


ma.updatePerformance(true)
ma.updatePerformance(predictedStatusCode = 200, actualStatusCode = 200) //true
assertEquals(1.0, ma.estimateAccuracy(), delta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess here you could add assertions on the estimated precision

@arcuri82
Copy link
Collaborator

@MohsenTaheriShalmani is this ready for me to review? or are you still making changes?
btw, recall that, if you are waiting for my review and that is blocking you from working further, you can do new changes in a branch of the branch while I am reviewing the first one :)

override fun estimateMetrics(endpoint: Endpoint): ModelEvaluation {
val m = models[endpoint] ?: return ModelEvaluation(
accuracy = 0.5,
precision400 = 0.5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0.5 as default values and not 0.0?

override fun estimateOverallAccuracy(): Double {
override fun estimateOverallMetrics(): ModelEvaluation {
if (models.isEmpty()) {
return ModelEvaluation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code. should have a default immutable instance for example declared and returned directly from ModelEvaluaton, eg something like ModelEvaluation.DEFAULT_NO_DATA

val n = models.size.toDouble()
val sum = models.values.sumOf { it.estimateOverallAccuracy() }
val total = models.values.map {
it?.estimateOverallMetrics() ?: ModelEvaluation(0.5, 0.5, 0.0, 0.0, 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code

precision400 = 0.5,
recall400 = 0.0,
f1Score400 = 0.0,
mcc = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code

val acc = modelMetrics.estimateAccuracy()
val prec = modelMetrics.estimatePrecision400()
val rec = modelMetrics.estimateRecall400()
val f1 = if (prec + rec == 0.0) 0.0 else 2 * (prec * rec) / (prec + rec)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is f1 derived from prec and rec? if so, why having it as input, and not computed directly inside ModelMetrics?

override fun estimateF1Score400(): Double {
val p = estimatePrecision400()
val r = estimateRecall400()
return if (p + r > 0.0) 2 * (p * r) / (p + r) else 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated equation

* Unified metrics estimate packaged into a [ModelEvaluation].
*/
override fun estimateMetrics(): ModelEvaluation {
return ModelEvaluation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be the default implementation directly in the interface ModelEvaluation.

actualStatusCode == 400 && predictedStatusCode != 400 -> falseNegative400Queue.add(true)
actualStatusCode != 400 && predictedStatusCode == 400 -> falsePositive400Queue.add(true)
actualStatusCode != 400 && predictedStatusCode != 400 -> trueNegative400Queue.add(true)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i m not sure these computations are correct, if we want to have a timewindow of size N.
regardless of the number of queues, only half of them get new data at each call of this function. and only with value true.
i think all queues should be updated (eg, consider the false cases as well)

override fun estimateAccuracy(endpoint: Endpoint): Double = 0.0
override fun estimateOverallAccuracy(): Double = 0.0
override fun estimateMetrics(endpoint: Endpoint): ModelEvaluation = ModelEvaluation(accuracy = 0.5,precision400 = 0.5,recall400 = 0.0,f1Score400 = 0.0,mcc = 0.0)
override fun estimateOverallMetrics(): ModelEvaluation = ModelEvaluation(accuracy = 0.5,precision400 = 0.5,recall400 = 0.0,f1Score400 = 0.0,mcc = 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code. this default instance should really be instantiated only once and referred in all thes different places. so that, if we want to change the default of one parameter, we do it only once, and not 20 times :)

fun estimateModelMetrics() {

val delta = 0.0001
val ma = ModelMetricsWithTimeWindow(10) // buffer large enough to hold all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides this test, add back previous deleted test, where val ma = ModelAccuracyWithTimeWindow(4).
that was done on purpose to verify the time window.
as rule of thumb, do not delete or modify existing tests, add new ones (of course if needed, changes/deletes can be discussed).
as wrote in a previous comment, I do not think the timewindow implementation is correct. to verify it, need a buffer that is smaller than the data you insert

}

return estimateOverallAccuracy()
return ModelEvaluation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand this change. why removing the call to estimateOverallAccuracy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for an endpoint model, “overall” is the same as the endpoint’s metrics, so I just return its own metrics directly.

modelMetrics.estimatePrecision400(),
modelMetrics.estimateRecall400(),
modelMetrics.estimateMCC400()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as in other case. a model for a specific endpoint will have no difference between estimateMetrics(endpoint: Endpoint) and estimateOverallMetrics() apart from the endpoint validation in the former. so one function should call the other to avoid code duplication. also, what is the difference between:

return ModelEvaluation(
            modelMetrics.estimateAccuracy(),
            modelMetrics.estimatePrecision400(),
            modelMetrics.estimateRecall400(),
            modelMetrics.estimateMCC400()
        )

and

return modelMetrics.estimateMetrics()

?


/** Unified metrics estimate packaged into a [ModelEvaluation] */
override fun estimateMetrics(): ModelEvaluation {
return ModelEvaluation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be overridden if the implementation is the same?

override fun estimateMetrics(): ModelEvaluation {
return ModelEvaluation(
accuracy = estimateAccuracy(),
precision400 = estimatePrecision400(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment. if the implementation is the same, then no need to override

val ma = ModelMetricsWithTimeWindow(10) // buffer large enough to hold all

// 1) correct prediction: 200 vs. 200 -> TN
ma.updatePerformance(predictedStatusCode = 200, actualStatusCode = 200)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this test. but add a new one (can copy&paste some parts of this test)in which the amount of data is more than the buffer. eg you could have a buffer of size 2, and insert the first 5 elements in this test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants