Skip to content

C API 0.99.3

Pre-release
Pre-release
Compare
Choose a tag to compare
@benjeffery benjeffery released this 27 Jul 14:46
ab54128

C API release.

Breaking changes

  • tsk_mutation_table_add_row has an extra time argument. If the time is unknown TSK_UNKNOWN_TIME should be passed. (@benjeffery, #672)
  • Change genotypes from unsigned to signed to accommodate missing data. (see #144 for discussion). This only affects users of the tsk_vargen_t class. Genotypes are now stored as int8_t and int16_t types rather than the former unsigned types. The field names in the genotypes union of the tsk_variant_t struct returned by tsk_vargen_next have been renamed to i8 and i16 accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes.
  • Change the tsk_vargen_init method to take an extra parameter alleles. To keep the current behaviour, set this parameter to NULL.
  • Edges can now have metadata. Hence edge methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with the TSK_NO_EDGE_METADATA flag. (@benjeffery, #496, #712)
  • Migrations can now have metadata. Hence migration methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505)
  • The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
  • Bad tree topologies are detected earlier, so that it is no longer possible to create a tsk_treeseq_t object which contains a parent with contradictory children on an interval. Previously an error occurred when some operation building the trees was attempted. (@jeromekelleher, #709)

New features

  • New methods to perform set operations on table collections. tsk_table_collection_subset subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690). tsk_table_collection_union forms the node-wise union of two table collections. (@mufernando, @petrelharp, #381, #623)
  • Mutations now have an optional double-precision floating-point time column. If not specified, this defaults to a particular NaN value (TSK_UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Add tsk_table_collection_compute_mutation_times and new flag to tsk_table_collection_check_integrity:TSK_CHECK_MUTATION_TIME. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672)
  • Add metadata and metadata_schema fields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641)
  • Add the TSK_KEEP_UNARY option to simplify (@gtsambos). See #1 and #143.
  • Add a set_root_threshold option to tsk_tree_t which allows us to set the number of samples a node must be an ancestor of to be considered a root. (#462)
  • Change the semantics of tsk_tree_t so that sample counts are always computed, and add a new TSK_NO_SAMPLE_COUNTS option to turn this off. (#462)
  • Tables with metadata now have an optional metadata_schema field that can contain arbitrary bytes. (@benjeffery, #493)
  • Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530)
  • Support for reading/writing to arbitrary file streams with the loadf/dumpf variants for tree sequence and table collection load/dump. (@jeromekelleher, @grahamgower, #565, #599)
  • Add low-level sorting API and TSK_NO_CHECK_INTEGRITY flag. (@jeromekelleher, #627, #626)
  • Add extension of Kendall-Colijn tree distance metric for tree sequences computed by tsk_treeseq_kc_distance (@daniel-goldstein, #548)

Deprecated

  • The TSK_SAMPLE_COUNTS options is now ignored and will print out a warning if used. (#462)