Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add modular simplifier #2752

Draft
wants to merge 88 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
365de88
add prototypes
molpopgen May 8, 2023
4ba5513
clang-format
molpopgen May 8, 2023
2626292
add implementations
molpopgen May 8, 2023
12a1245
alloc array for tracking parent contiguity
molpopgen May 8, 2023
0656a78
check for contiguous parent processing
molpopgen May 8, 2023
c3d791b
test skeleton
molpopgen May 8, 2023
293a5f5
building up tests
molpopgen May 8, 2023
e61c48c
whew
molpopgen May 8, 2023
03673a9
free
molpopgen May 8, 2023
5ff0442
working on test
molpopgen May 8, 2023
b8b7253
that was kinda brutal
molpopgen May 8, 2023
d37cb5d
more stress test
molpopgen May 8, 2023
4b49a7a
internal checks so we don't have to test internal details
molpopgen May 8, 2023
57c1b01
modularize, test that we can catch birth order violations
molpopgen May 8, 2023
bcc9a15
move comments
molpopgen May 8, 2023
7a96dcd
FIXME
molpopgen May 8, 2023
145abce
show some of the side effects that one can get
molpopgen May 8, 2023
c71e4ed
not detecting discontiguous parents
molpopgen May 9, 2023
59ce926
all tests fail when we try to check for non-contiguous parents
molpopgen May 9, 2023
946a735
fix array init to TSK_NULL
molpopgen May 9, 2023
f3bfd49
name field for what we use it for...
molpopgen May 9, 2023
eeef36a
valgrind
molpopgen May 9, 2023
7e799f0
testing pattern w/less duplication
molpopgen May 9, 2023
127106e
found a goof. yay for testing
molpopgen May 9, 2023
ac8236f
fix logic error
molpopgen May 9, 2023
0a149ce
start treeseq comparison
molpopgen May 9, 2023
f19f3a0
tree time comparisons...
molpopgen May 9, 2023
c94677f
notes
molpopgen May 9, 2023
ae005e6
TODO
molpopgen May 9, 2023
bddce27
fix typo
molpopgen May 9, 2023
afe426f
update note
molpopgen May 9, 2023
bfa2484
fix setup comment
molpopgen May 9, 2023
dd2e20e
fix implicit integer cast flagged by circle CI
molpopgen May 9, 2023
0a8f5ab
truly fix the cast
molpopgen May 9, 2023
c0112d1
check that no new child is born before any child in the input tables.
molpopgen May 9, 2023
91734e3
use correct error codes
molpopgen May 9, 2023
ebcbd5e
add tests of null parent/child in edge
molpopgen May 9, 2023
6fb5985
test for child time > that of any in the input tables
molpopgen May 9, 2023
6803b4d
remove useless if
molpopgen May 9, 2023
b2830ff
test node out of range
molpopgen May 9, 2023
77bb9c8
bad samples
molpopgen May 9, 2023
e30c155
note
molpopgen May 9, 2023
09bd543
input table integrity check fail
molpopgen May 9, 2023
820eed6
start on overlapping generation tests
molpopgen May 9, 2023
fdbfac1
big code duplication. ugh
molpopgen May 9, 2023
a38e8a0
need to think now about how to handle overlapping gens w/o the actual…
molpopgen May 9, 2023
b5eb446
note for tomorrow
molpopgen May 9, 2023
f5c23cb
more detailed notes
molpopgen May 9, 2023
a890997
update what to do
molpopgen May 10, 2023
4060d13
fix cast errors that aren't in printf
molpopgen May 10, 2023
e95b671
start mocking a buffer for internal testing. overlapping generations…
molpopgen May 16, 2023
f0cb280
not quite
molpopgen May 16, 2023
8bba18a
remove memory leak
molpopgen May 16, 2023
c04efbb
closer
molpopgen May 16, 2023
07164aa
tests pass
molpopgen May 17, 2023
aeea43b
comment out unused code
molpopgen May 17, 2023
c1cb315
fix compile fails on CI
molpopgen May 17, 2023
fcff748
allocate new edges as needed
molpopgen May 17, 2023
23544cd
be careful about setup
molpopgen May 17, 2023
0732c81
force a failing test
molpopgen May 17, 2023
c81da32
we don't need a particular order for the child nodes...
molpopgen May 17, 2023
8d6f8b4
add test of parent time error
molpopgen May 17, 2023
3b39431
reinstate test failure
molpopgen May 17, 2023
cb0bd03
start refactor from temp buffer back into temp edge table. idea is t…
molpopgen May 26, 2023
8af74c1
at least no segfault any more...
molpopgen May 26, 2023
9254283
some pring
molpopgen May 26, 2023
154e5a3
we are not detecting parent time violations properly
molpopgen May 26, 2023
ead9f5c
now tests pass
molpopgen May 26, 2023
5114645
clean
molpopgen May 26, 2023
60ef2e4
delete printf
molpopgen May 26, 2023
c25d6f9
start to abstract out the test runner step
molpopgen May 26, 2023
7819804
streamline for new test runner
molpopgen May 26, 2023
8d25cc6
use new runner for most/all tests
molpopgen May 26, 2023
dc51f7d
cleanup
molpopgen May 26, 2023
2e1be0f
add new .c
molpopgen May 31, 2023
b11525d
update the meson stuff
molpopgen May 31, 2023
9f2d496
add some infrastructure
molpopgen May 31, 2023
add7347
add some infrastructure
molpopgen May 31, 2023
4e07d22
fmt
molpopgen May 31, 2023
3dfc9fc
skeleton in place. we have a segfault
molpopgen May 31, 2023
24a76bf
make valgrind happy
molpopgen May 31, 2023
a5ac4c4
cleanup
molpopgen May 31, 2023
cc89815
cleanup
molpopgen May 31, 2023
f095a76
add death rate, update lots of the new example internals
molpopgen May 31, 2023
0c498c0
overlapping gens seems to work
molpopgen May 31, 2023
fcd17f3
add an important note
molpopgen May 31, 2023
e19a5a6
Add basic simulation for development
jeromekelleher Jun 23, 2023
c559f64
Add almost-working version with Python simplifier
jeromekelleher Jun 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions c/examples/efficient_forward_simulation.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>

#include <tskit/tables.h>

#define check_tsk_error(val) \
if (val < 0) { \
errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val)); \
}

typedef struct {
double left;
double right;
double parent_birth_time;
tsk_id_t parent;
tsk_id_t child;
} birth;

int
cmp_birth(const void *lhs, const void *rhs)
{
const birth *clhs = (const birth *) lhs;
const birth *crhs = (const birth *) rhs;
int ret = (clhs->parent_birth_time > crhs->parent_birth_time)
- (crhs->parent_birth_time > clhs->parent_birth_time);
if (ret == 0) {
ret = (clhs->parent > crhs->parent) - (crhs->parent > clhs->parent);
}
return ret;
}

typedef struct {
birth *births;
tsk_size_t capacity;
tsk_size_t size;
} edge_buffer;

int
edge_buffer_init(edge_buffer *buffer, tsk_size_t initial_capacity)
{
int ret = 0;
if (initial_capacity == 0) {
ret = -1;
goto out;
}
buffer->births = (birth *) malloc(initial_capacity * sizeof(birth));
buffer->capacity = initial_capacity;
buffer->size = 0;
out:
return ret;
}

void
edge_buffer_realloc(edge_buffer *buffer)
{
if (buffer->size + 1 >= buffer->capacity) {
buffer->capacity *= 2;
buffer->births
= (birth *) realloc(buffer->births, buffer->capacity * sizeof(birth));
}
}

void
edge_buffer_free(edge_buffer *buffer)
{
if (buffer->births != NULL) {
free(buffer->births);
buffer->births = NULL;
}
}

void
edge_buffer_buffer_birth(double left, double right, double parent_birth_time,
tsk_id_t parent, tsk_id_t child, edge_buffer *buffer)
{
edge_buffer_realloc(buffer);
buffer->births[buffer->size].left = left;
buffer->births[buffer->size].right = right;
buffer->births[buffer->size].parent_birth_time = parent_birth_time;
buffer->births[buffer->size].parent = parent;
buffer->births[buffer->size].child = child;
buffer->size += 1;
}

void
edge_buffer_prep_for_simplification(edge_buffer *buffer)
{
qsort(buffer->births, (size_t) buffer->size, sizeof(birth), cmp_birth);
}

/* Overlapping generations are a pain point.
* Nodes that are currently "alive" can span an
* arbitrary range of ages >= 0.0.
* These nodes can have ancestry (edges).
* The children of these edges may not be "alive" (e.g.,
* cannot be parents).
*
* We identify the range of such edges in a (simplified)
* table collection and lift them over into our buffer.
*/
void
edge_buffer_post_simplification(tsk_id_t *output_samples, tsk_size_t num_output_samples,
tsk_table_collection_t *tables, edge_buffer *buffer)
{
/* node times cannot be negative, so this
* is a reasonable floor
*/
double max_alive_time = 0.0;
int64_t i, last_row_to_lift_over = -1;
tsk_size_t moved = 0;

for (i = 0; i < num_output_samples; ++i) {
max_alive_time = TSK_MAX(tables->nodes.time[output_samples[i]], max_alive_time);
}
for (i = 0; i < tables->edges.num_rows; ++i) {
if (tables->nodes.time[tables->edges.parent[i]] <= max_alive_time) {
last_row_to_lift_over = (int) i;
}
}
if (last_row_to_lift_over > -1) {
for (i = 0; i < (tsk_size_t) last_row_to_lift_over; ++i) {
edge_buffer_buffer_birth(tables->edges.left[i], tables->edges.right[i],
tables->nodes.time[tables->edges.parent[i]], tables->edges.parent[i],
tables->edges.child[i], buffer);
}
for (i = (tsk_size_t) last_row_to_lift_over + 1; i < tables->edges.num_rows;
++i) {
tables->edges.left[moved] = tables->edges.left[i];
tables->edges.right[moved] = tables->edges.right[i];
tables->edges.parent[moved] = tables->edges.parent[i];
tables->edges.child[moved] = tables->edges.child[i];
moved += 1;
}
tsk_edge_table_truncate(&tables->edges, moved);
}
}

void
edge_buffer_clear(edge_buffer *buffer)
{
buffer->size = 0;
}

void
simulate(
tsk_table_collection_t *tables, int N, int T, int simplify_interval, double pdeath)
{
tsk_id_t *buffer, *alive, *deaths, *replacements, *idmap, child, left_parent,
right_parent;
double breakpoint;
int ret, j, t;
edge_buffer new_births;
size_t ndeaths;
tsk_modular_simplifier_t simplifier;

assert(simplify_interval != 0); // leads to division by zero
assert(pdeath > 0.0 && pdeath <= 1.0);
ret = edge_buffer_init(&new_births, 1000);
assert(ret == 0);
buffer = malloc(2 * N * sizeof(tsk_id_t));
if (buffer == NULL) {
errx(EXIT_FAILURE, "Out of memory");
}
idmap = malloc(N * sizeof(tsk_id_t));
if (idmap == NULL) {
errx(EXIT_FAILURE, "Out of memory");
}
deaths = malloc(N * sizeof(tsk_id_t));
if (deaths == NULL) {
errx(EXIT_FAILURE, "Out of memory");
}
tables->sequence_length = 1.0;
alive = buffer;
replacements = buffer + N;
for (j = 0; j < N; j++) {
alive[j]
= tsk_node_table_add_row(&tables->nodes, 0, T, TSK_NULL, TSK_NULL, NULL, 0);
check_tsk_error(alive[j]);
}
for (t = T - 1; t >= 0; t--) {
ndeaths = 0;
for (j = 0; j < N; j++) {
/* NOTE: the use of rand() is discouraged for
* research code and proper random number generator
* libraries should be preferred.
*/
if (rand() / (1. + RAND_MAX) <= pdeath) {
deaths[ndeaths] = j;
++ndeaths;
}
}
for (j = 0; j < ndeaths; j++) {
child = tsk_node_table_add_row(
&tables->nodes, 0, t, TSK_NULL, TSK_NULL, NULL, 0);
check_tsk_error(child);
/* NOTE: the use of rand() is discouraged for
* research code and proper random number generator
* libraries should be preferred.
*/
left_parent = alive[(size_t)((rand() / (1. + RAND_MAX)) * N)];
right_parent = alive[(size_t)((rand() / (1. + RAND_MAX)) * N)];
do {
breakpoint = rand() / (1. + RAND_MAX);
} while (breakpoint == 0); /* tiny proba of breakpoint being 0 */
/* NOTE: invalid left/right values here CANNOT be caught
* by tsk_table_collection_check_integrity!
* (They are not present in the edge table when the
* simplifier is initialized, and the simplified tables
* will naively process the overlaps, resulting in invalid output.)
*
* It is therefore a precondition that input edges are okay.
*/
edge_buffer_buffer_birth(0., breakpoint, tables->nodes.time[left_parent],
left_parent, child, &new_births);
edge_buffer_buffer_birth(breakpoint, 1.0, tables->nodes.time[right_parent],
right_parent, child, &new_births);
replacements[j] = child;
}
/* replace deaths with births */
for (j = 0; j < ndeaths; j++) {
alive[deaths[j]] = replacements[j];
}
if (t % simplify_interval == 0) {
printf("Simplify at generation %lld: (%lld nodes %lld edges)", (long long) t,
(long long) tables->nodes.num_rows, (long long) tables->edges.num_rows);
edge_buffer_prep_for_simplification(&new_births);
idmap
= (tsk_id_t *) realloc(idmap, tables->nodes.num_rows * sizeof(tsk_id_t));
ret = tsk_modular_simplifier_init(&simplifier, tables, alive, N, 0);
check_tsk_error(ret);
j = 0;
while (j < new_births.size) {
left_parent = new_births.births[j].parent;
while (
j < new_births.size && new_births.births[j].parent == left_parent) {
ret = tsk_modular_simplifier_add_edge(&simplifier,
new_births.births[j].left, new_births.births[j].right,
new_births.births[j].parent, new_births.births[j].child);
check_tsk_error(ret);
j++;
}
ret = tsk_modular_simplifier_merge_ancestors(&simplifier, left_parent);
check_tsk_error(ret);
}
ret = tsk_modular_simplifier_finalise(&simplifier, idmap);
check_tsk_error(ret);
ret = tsk_modular_simplifier_free(&simplifier);
check_tsk_error(ret);
/* For fun/safety/paranoia */
ret = tsk_table_collection_check_integrity(tables, TSK_CHECK_EDGE_ORDERING);
check_tsk_error(ret);
printf(" -> (%lld nodes %lld edges)\n", (long long) tables->nodes.num_rows,
(long long) tables->edges.num_rows);
for (j = 0; j < N; j++) {
alive[j] = idmap[alive[j]];
assert(alive[j] != TSK_NULL);
}
/* The order of these next two steps MATTERS */
edge_buffer_clear(&new_births);
edge_buffer_post_simplification(alive, N, tables, &new_births);
}
}
free(buffer);
free(idmap);
free(deaths);
edge_buffer_free(&new_births);
}

int
main(int argc, char **argv)
{
int ret;
tsk_table_collection_t tables;

if (argc != 7) {
errx(EXIT_FAILURE, "usage: N T simplify-interval output-file seed pdeath");
}
ret = tsk_table_collection_init(&tables, 0);
check_tsk_error(ret);
srand((unsigned) atoi(argv[5]));
simulate(&tables, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]), atof(argv[6]));
ret = tsk_table_collection_dump(&tables, argv[4], 0);
check_tsk_error(ret);

tsk_table_collection_free(&tables);
return 0;
}
3 changes: 3 additions & 0 deletions c/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -117,5 +117,8 @@ if not meson.is_subproject()
executable('haploid_wright_fisher',
sources: ['examples/haploid_wright_fisher.c'],
link_with: [tskit_lib], dependencies: lib_deps)
executable('efficient_forward_simulation',
sources: ['examples/efficient_forward_simulation.c'],
link_with: [tskit_lib], dependencies: lib_deps)
endif
endif
Loading
Loading