Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow for evaluating predictions #12

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Dec 5, 2023

This workflow takes in three parts:

  1. Positive, manually curated mappings
  2. Negative, manually curated mappings
  3. Predicted mappings

And estimates several metrics such as accuracy, precision, recall, and F1 for the predictions. This gives back an estimation of the true metrics, since the positive and negative manually curated mappings likely are not complete and therefore have some bias in which things were curated (e.g., I always curate the easiest first, leading towards a skew that more of my manual curations result in positive calls).

Why is this useful?

Mapping tool competitions don't have to keep writing their own infrastructure for holding their competitions. You do the following:

  1. Curate (or generate) the gold standard correct and incorrect mappings
  2. Ask the competitors to generate their predictions in SSSOM
  3. Load them into this function and get results

Demonstration

This also comes with a demonstrator by comparing a combination first-party ontology curations combine with third-party Biomappings curations against lexical mapping predictions made by Gilda. It reports the following when applied to a small number of OBO Foundry ontologies.

prefix completion accuracy precision recall $F_1$
chebi 10.8% 98.0% 98.8% 99.1% 99.0%
cl 28.3% 53.7% 90.8% 47.9% 62.7%
clo 52.6% 34.9% 70.0% 38.9% 50.0%
doid 30.1% 26.8% 92.2% 26.3% 40.9%
go 38.0% 80.0% 81.8% 96.8% 88.7%
maxo 44.6% 86.4% 100.0% 86.4% 92.7%
uberon 6.3% 11.2% 98.5% 11.1% 20.0%
vo 66.4% 79.1% 91.7% 77.2% 83.8%

Completion refers to the percentage of predicted mappings that appear in the curated sets (both positive and negative). A higher completion reduces the impact of curation bias. E.g., a completion of 100% means that the metrics are unbiased.

Note that lexical matching has pretty high precision, i.e., most of the predictions it makes are right, but it is more prone to false negatives, so accuracy can vary. Some observations:

  • This leads to the DOID accuracy being pretty low.
  • ChEBI has no curations outside of Biomappings, so the number of false negatives is zero, meaning that the accuracy is a less useful metric (TBD, how to communicate that in the table).
  • CLO has a large number of duplicate terms, which results in an artificially low precision.

Caution

Mapping shouldn't be a competition. Make your predictions, curate them, contribute them to Biomappings or directly upstream, then everyone benefits and we don't have to keep playing this game.

@matentzn
Copy link

matentzn commented Dec 5, 2023

Wow this is such a cool idea.. Awesome man!

Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 98 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@8d1d4b4). Click here to learn what that means.

Files Patch % Lines
src/semra/evaluate_prediction.py 0.00% 98 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #12   +/-   ##
=======================================
  Coverage        ?   28.57%           
=======================================
  Files           ?       32           
  Lines           ?     2390           
  Branches        ?      488           
=======================================
  Hits            ?      683           
  Misses          ?     1666           
  Partials        ?       41           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants