Replies: 3 comments
-
FWIW, I implemented a data store of numpy values for giglib, which could serve as a template for converting the self.start_index and self.end_index values to a single 2xN numpy array of ints, here: https://github.com/hyanwong/giglib/blob/2139326ea1da6c5e1265757484ad4b76db6028f2/giglib/tables.py#L152 I think 32 bit ints would be enough because we are unlikely to want to store more than 2147483648 separate ARGs, I would have thought? |
Beta Was this translation helpful? Give feedback.
-
From a comment on slack, re efficiency of adding new ARGs:
Adding new edges is probably fast. Extracting the subset to make a TreeSequence is probably slow. A bigger change to substantially improve efficiency of traversing through the ARGs would be a way to avoid having to do all the |
Beta Was this translation helpful? Give feedback.
-
Also some more thoughts: we could do a I'm not sure how this would work at to extract a "view" of the MultiEdgeTable: I suspect this is only a numpy/python thing, so would need some reworking for a C method to take subsets to make a valid edge table. All these are elaborations to make the general idea faster, though. |
Beta Was this translation helpful? Give feedback.
-
Prompted by @YunDeng98, I thought I would have a go at prototyping a Python class for storing multiple similar tree sequences. I call this a MultiTableCollection. The main idea is to keep everything similar to a tskit TableCollection, but change the edge table so that it contains 2 extra columns, a
start_index
and anend_index
. There's no need to make different node tables: we can easily have nodes present in some of the tree sequences and not in others, as tskit doesn't require nodes to be referenced by any edge in a tree sequence.The general idea is that you add to the MultiTableCollection by calling
.add_row()
with the requiredstart_index
, and remove an edge from all subsequent tree sequences by calling.end_row()
or.end_rows()
. At any time you can callMultiTableCollection.tree_sequence(i)
to extract thei
th tree sequence from the MultiTableCollection (checking consistency is only done when you grab out a tree sequence like this). In my code below, a separate MultiEdgeTable is stored, and subset down to a selection of edges when the.tree_sequence(i)
method is called. The implementation below is a bit scrappy, and missing lots of functionality (e.g. you can't save one of these things to disk at the moment, which would require implementing dump and 'load') but it seems to work!I guess it could form the base for testing e.g. a full C implementation, if required. Implementing it first in python makes it a lot quicker to test the basic principle.
Here's a quick test:
Beta Was this translation helpful? Give feedback.
All reactions