Skip to content

0.10.0

Compare
Choose a tag to compare
@steppi steppi released this 03 Dec 03:02
· 56 commits to master since this release

This release makes several changes concerning model statistics.

  1. The global precision, recall, and F1 scores for a classifier now use micro-averaging to aggregate across the scores for different positive class labels rather than taking an average weighted by the frequencies for each positive label. Micro-averaging looks at global counts of true positives, false positives, and false negatives
    across all positive labels. A true positive involves any positive labeled datapoint classified correctly. A false positive involves any positive labeled datapoint that has been classified incorrectly. A false negative involves any datapoint being classified incorrectly to a positive labeled datapoint. Note that false positives and false negatives can overlap. Micro-averaging is easier to reason about and interpret and using it allows for some simplification of implementation in other places. The original decision to use the weighted average was made with little thought at a time when we were making less use of model statistics.

  2. A method has been added to adeft.disambiguate.AdeftDisambiguator that allows the set of positive labels to be updated while recomputing global model statistics. Previously it was required to retrain the model. This is facilitated by storing the entire label vs label confusion matrix for each CV fold upon training a model and serializing this when saving the model.

Bug fixes and a smaller changes were also made

  1. A bug was fixed that was causing the labels in model statistics to fail to update when adeft.disambiguate.AdeftDisambiguator.modify_groundings was used to update groundings in a model.
  2. A bug was fixed that caused the labels attribute of an adeft.disambiguate.AdeftDisambiguator to not contain labels for which no defining pattern exists. (These labels are typically for texts manually curated in Entrez as mentioning a particular gene with the shortform of interest as a synonym but which are not abbreviations.)
  3. A new attribute was added to classifiers called other_metadata. Anything jsonable stored within this attribute will be preserved upon model serialization. We are using this to store any relevant information needed to retrain a model that does not fit into the existing attributes. This allows for simplification of the retraining process.
  4. Some small updates have been made to the introductory Jupyter notebook.