general_stat()/sample_count_stat() and f(0)=0 requirement #1188
-
Consider the following code which has a bug. The bug is triggered by the final line, but not by the penultimate line. import msprime
def allele_counts(ts, samples):
def f(x):
if x == 0 or x == len(samples):
return 0
return x
return ts.sample_count_stat(
[samples],
f,
1,
windows="sites",
polarised=False,
mode="site",
span_normalise=False,
)
ts = msprime.simulate(
100, Ne=10000, mutation_rate=1e-8, length=100000, random_seed=1234
)
samples = ts.samples()
ac1 = allele_counts(ts, ts.samples())
ac2 = allele_counts(ts, ts.samples()[:10])
Now, reading the "Note" in the docs for
Indeed, the code above follows this to the letter. Presumably the code in the note must not be taken literally, because --- test_sample_count_stat.py
+++ test_sample_count_stat2.py
@@ -1,10 +1,11 @@
import msprime
+import numpy as np
def allele_counts(ts, samples):
def f(x):
- if x == 0 or x == len(samples):
- return 0
+ if all(x == 0) or all(x == len(samples)):
+ return np.zeros_like(x)
return x
return ts.sample_count_stat( This isn't really a bug in tskit, although maybe the checks could be more strict in the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
Hm, ok: this is a documentation bug, for sure. Let's see: What if the docs said:
The requirement exists because statistics with this requirement are insensitive to parts of the tree that are not segregating between any of the samples. So, if you've got a summary funciton that doesn't satisfy this requirement, then it depends on parts of the tree that are unobservable from polymorphism data (e.g., the length of a branch above the root, or branches not ancestral to any of the samples). That might be what you want to do, but we want you to make sure it's what you want to do. (so you have to do I could have sworn this was in the docs somewhere, but I can't find it. I can add something about it. |
Beta Was this translation helpful? Give feedback.
Hm, ok: this is a documentation bug, for sure. Let's see:
What if the docs said:
The requirement exists because statistics with this requirement are insensitive to parts of the tree that are not segregating between any of the samples. So, if you've got a summary funciton …