-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: vectorize cramervonmises_2samp #58
base: perm_ttest_efficiency
Are you sure you want to change the base?
ENH: vectorize cramervonmises_2samp #58
Conversation
I think it's a good idea. If we were to go this route, I would consider changing On the other hand, since it's just based on ranks, I think there are much more efficient methods for calculating the exact distribution of the statistic under the null hypothesis (e.g. using recursion), if you're interested. If you go that route, I'd recommend implementing the distribution in a class. Thanks for thinking of this! |
In fact, generating a array with pythran can be very efficient, without a huge cost. Let see:
In [1]: import meth1, meth2, meth3
In [2]: m1 = meth1.get_indices_permutation_test(21, 10)
...: m2 = meth2.get_indices_permutation_test(21, 10)
...: m3 = meth3.get_indices_permutation_test(21, 10)
In [3]: (m1==m2).all()
Out[3]: True
In [4]: (m1==m3).all()
Out[4]: True
In [5]: %timeit meth1.get_indices_permutation_test(21, 10)
4.39 s ± 89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %timeit meth2.get_indices_permutation_test(21, 10)
2.66 s ± 32.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit meth3.get_indices_permutation_test(21, 10)
47.5 ms ± 503 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) It is quick a dirty experiment. If it is good for you, I will do the same for |
I don't know if we have recursion formula for this distribution. Do you have some results for this? |
Yeah! That would be great.
I haven't looked. We can leave that to the future. Compilation will be useful for the random permutation tests, so in that case it makes sense to compile both. Let's wait until scipygh-13661 is merged, which will need to wait for another stats maintainer. After that, if you submit a PR with these improvements, I can review and merge it. In that case, I'm going to leave the code alone in scipygh-13661 - not replace with the set difference technique - because I think it will speed up review. I didn't write that mask-based code; I just moved it from |
Looks like there is some information about generating the CVM distribution efficiently here. |
@mdhaber: A improvment of cramer von mises test with vectorization. I think of this vectorization when I was reviewing your PR. This code depends on
_all_partitions
, therefore I submit a PR to your branch. I can wait the merge to scipy/master and propose a PR on scipy if you prefer (and in this case, please close this PR).Before vectorization:
After vectorization:
With this improvment, maybe the threshold (now it is 10, for the sum between
nx
andny
) for switching from exact mode to asymptotic mode formethod="auto"
can be adapted.