I am experimenting with very large datasets (~ 1e6 to 1e7 points). It seems that storing the data as (threshold, label) tuples and then computing the measures and confusion matrices in python is much, much slower than keeping the data in numpy arrays (where available), and doing vectorized operations on the arrays. I don't know if there is interest in something like this.
I might attempt to implement something like that to be abe to handle the large datasets.
I am experimenting with very large datasets (~ 1e6 to 1e7 points). It seems that storing the data as (threshold, label) tuples and then computing the measures and confusion matrices in python is much, much slower than keeping the data in numpy arrays (where available), and doing vectorized operations on the arrays. I don't know if there is interest in something like this.
I might attempt to implement something like that to be abe to handle the large datasets.