Move split_upwards() to tskit? #27

gregorgorjanc · 2024-10-07T06:41:30Z

We don't necessarily need split_upwards() anymore with the new matvec (tree-seq-based-GRM-times-a-vec) algorithm. However, it's good to have this functionality for testing etc.

I am working with @jeromekelleher on adding edge effect, edge value, and node value functionality to tstrait (see tskit-dev/tstrait#155). As part of testing this work, I am finding it handy to have the split_upwards(), which makes me wonder if split_upwards() should be in tskit (instead in tslmm).

Thoughts?

The text was updated successfully, but these errors were encountered:

jeromekelleher · 2024-10-07T07:56:32Z

It sounds like something that should be in tskit, I guess the only reason we put it here is so we wouldn't need to do so much testing and worrying about corner cases

jeromekelleher · 2024-10-07T07:57:10Z

Should probably rename to split_pastwards or something also?

petrelharp · 2024-10-07T15:43:51Z

I think it's going to be totally used by people - it's shown up in three or four different people's work independently.

In testing, the thing to check is whether for each edge, the subtree below that edge doesn't change.

jeromekelleher · 2024-10-07T16:03:35Z

Do we have a tskit ready implementation?

nspope · 2024-10-07T16:16:29Z

Seems like a good idea to me.

@jeromekelleher -- the closest we have is the numba version you wrote (using the tree position jitclass)

tslmm/tslmm/operations.py

Line 106 in 2777319

def split_upwards_numba(ts):

so this would need to be moved to C I suppose

jeromekelleher · 2024-10-07T19:04:33Z

That would be a fair bit of work to port to C

nspope · 2024-10-07T21:05:01Z

I can port it over, will be next month at the earliest though. We could put in a pure-python method if it seems convenient enough to have around in the meantime

jeromekelleher · 2024-10-08T09:54:04Z

I'm in no hurry! Pure Python would be too slow to be useful.

gregorgorjanc · 2024-10-08T11:58:30Z

The existing numba version would not be considered for tskit because you aim for both C and Python API @jeromekelleher?

jeromekelleher · 2024-10-08T12:26:51Z

Tskit is quite strict about adding dependencies (for various reasons) and we don't currently have a dependency on numba. In practise, numba is quite a tricky dependency, so we would need very good reason indeed to add it to tskit.

hanbin973 · 2024-10-08T17:36:14Z

Another issue is that the current numba implementation isn't faster than the python version.

The code:

def make_ts(num_samples, recombination_rate=1e-8, sequence_length=1e6, population_size=50_000):
    ts = msprime.sim_ancestry(
        samples=num_samples,
        recombination_rate=recombination_rate,
        sequence_length=sequence_length,
        model=msprime.StandardCoalescent(),
        population_size=population_size,
        ploidy=1
    )
    return ts

def measure_time(ts, num_iter):
    times = []
    for i in range(num_iter):
        t1 = time.time()
        operations.split_upwards(ts)
        t2 = time.time()
        times.append(t2-t1)
    return times

def measure_time_numba(ts, num_iter):
    times = []
    for i in range(num_iter):
        t1 = time.time()
        operations.split_upwards_numba(ts)
        t2 = time.time()
        times.append(t2-t1)
    return times

num_iter = 10
ns = np.linspace(2, 10002, 21).astype(int)
times = np.zeros((len(ns), num_iter))
times_numba = np.zeros((len(ns), num_iter))
for i, n in enumerate(ns):
    ts = make_ts(n)
    times[i,:] = measure_time(ts, num_iter)
    times_numba[i,:] = measure_time_numba(ts, num_iter)

The result:

nspope · 2024-10-08T17:56:00Z

Hm, probably njit can't compile away all the python bits so it's spending most of its time in the python interpreter? Another argument in favor of getting it into C, I guess

jeromekelleher · 2024-10-08T18:50:52Z

This is going to be significant work to add to tskit - I'd vote for focusing on other bigger impact stuff.

hanbin973 · 2024-10-08T20:39:02Z

I agree with Jerome. In small tree sequences for testing, python ver. is already pretty fast unless the population structure is complicated. For big sequences, we're not gonna use it anyway because of the memory demand.
The Q. is that would be helpful to have the python version on tskit? I can imagine using the function to demonstrate why sharing adjacent edges and not splitting them helps computation.

petrelharp · 2024-10-16T04:07:52Z

One possibility would be to write a short thing for the docs about the operation and give the python code there?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move split_upwards() to tskit? #27

Move split_upwards() to tskit? #27

gregorgorjanc commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

petrelharp commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

nspope commented Oct 7, 2024 •

edited

Loading

jeromekelleher commented Oct 7, 2024

nspope commented Oct 7, 2024 •

edited

Loading

jeromekelleher commented Oct 8, 2024

gregorgorjanc commented Oct 8, 2024

jeromekelleher commented Oct 8, 2024

hanbin973 commented Oct 8, 2024

nspope commented Oct 8, 2024

jeromekelleher commented Oct 8, 2024

hanbin973 commented Oct 8, 2024 •

edited

Loading

petrelharp commented Oct 16, 2024

Move split_upwards() to tskit? #27

Move split_upwards() to tskit? #27

Comments

gregorgorjanc commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

petrelharp commented Oct 7, 2024

jeromekelleher commented Oct 7, 2024

nspope commented Oct 7, 2024 • edited Loading

jeromekelleher commented Oct 7, 2024

nspope commented Oct 7, 2024 • edited Loading

jeromekelleher commented Oct 8, 2024

gregorgorjanc commented Oct 8, 2024

jeromekelleher commented Oct 8, 2024

hanbin973 commented Oct 8, 2024

nspope commented Oct 8, 2024

jeromekelleher commented Oct 8, 2024

hanbin973 commented Oct 8, 2024 • edited Loading

petrelharp commented Oct 16, 2024

nspope commented Oct 7, 2024 •

edited

Loading

nspope commented Oct 7, 2024 •

edited

Loading

hanbin973 commented Oct 8, 2024 •

edited

Loading