From bc49bb4f20e3e1910c8a42bd003771f3f7e26d1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Frederik=20Hvilsh=C3=B8j?= <93145535+frederik-encord@users.noreply.github.com> Date: Wed, 24 Apr 2024 10:04:30 +0200 Subject: [PATCH] chore: add known results (#66) --- README.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/README.md b/README.md index 1424035..90d3325 100644 --- a/README.md +++ b/README.md @@ -94,6 +94,86 @@ By default, this path corresponds to the repository directory. +## Some Example Results + +One example of where this `tti-eval` is useful is to test different open-source models against different open-source datasets within a specific domain. +Below, we focused on the medical domain. We evaluate nine different models of which three of them are domain specific. +The models are evaluated against four different medical datasets. Note, Further down this page, you will find links to all models and datasets. + +
+ An animation showing how to use the CLI to evaluate embedding models +
+

Figure 1: Linear probe accuracy across four different medical datasets. General purpose models are colored green while models trained for the medical domain are colored red. +

+
+ +
+The raw numbers from the experiment + +### Weighted KNN Accuracy + +| Model/Dataset | Alzheimer-MRI | LungCancer4Types | chest-xray-classification | skin-cancer | +| :--------------- | :-----------: | :--------------: | :-----------------------: | :---------: | +| apple | 0.6777 | 0.6633 | 0.9687 | 0.7985 | +| bioclip | 0.8952 | 0.7800 | 0.9771 | 0.7961 | +| clip | 0.6986 | 0.6867 | 0.9727 | 0.7891 | +| plip | 0.8021 | 0.6767 | 0.9599 | 0.7860 | +| pubmed | 0.8503 | 0.5767 | 0.9725 | 0.7637 | +| siglip_large | 0.6908 | 0.6533 | 0.9695 | 0.7947 | +| siglip_small | 0.6992 | 0.6267 | 0.9646 | 0.7780 | +| tinyclip | 0.7389 | 0.5900 | 0.9673 | 0.7589 | +| vit-b-32-laion2b | 0.7559 | 0.5967 | 0.9654 | 0.7738 | + +### Zero-shot Accuracy + +| Model/Dataset | Alzheimer-MRI | LungCancer4Types | chest-xray-classification | skin-cancer | +| :--------------- | :-----------: | :--------------: | :-----------------------: | :---------: | +| apple | 0.4460 | 0.2367 | 0.7381 | 0.3594 | +| bioclip | 0.3092 | 0.2200 | 0.7356 | 0.0431 | +| clip | 0.4857 | 0.2267 | 0.7381 | 0.1955 | +| plip | 0.0104 | 0.2267 | 0.3873 | 0.0797 | +| pubmed | 0.3099 | 0.2867 | 0.7501 | 0.1127 | +| siglip_large | 0.4876 | 0.3000 | 0.5950 | 0.0421 | +| siglip_small | 0.4102 | 0.0767 | 0.7381 | 0.1541 | +| tinyclip | 0.2526 | 0.2533 | 0.7313 | 0.1113 | +| vit-b-32-laion2b | 0.3594 | 0.1533 | 0.7378 | 0.1228 | + +--- + +### Image-to-image Retrieval + +| Model/Dataset | Alzheimer-MRI | LungCancer4Types | chest-xray-classification | skin-cancer | +| :--------------- | :-----------: | :--------------: | :-----------------------: | :---------: | +| apple | 0.4281 | 0.2786 | 0.8835 | 0.6437 | +| bioclip | 0.4535 | 0.3496 | 0.8786 | 0.6278 | +| clip | 0.4247 | 0.2812 | 0.8602 | 0.6347 | +| plip | 0.4406 | 0.3174 | 0.8372 | 0.6289 | +| pubmed | 0.4445 | 0.3022 | 0.8621 | 0.6228 | +| siglip_large | 0.4232 | 0.2743 | 0.8797 | 0.6466 | +| siglip_small | 0.4303 | 0.2613 | 0.8660 | 0.6348 | +| tinyclip | 0.4361 | 0.2833 | 0.8379 | 0.6098 | +| vit-b-32-laion2b | 0.4378 | 0.2934 | 0.8551 | 0.6189 | + +--- + +### Linear Probe Accuracy + +| Model/Dataset | Alzheimer-MRI | LungCancer4Types | chest-xray-classification | skin-cancer | +| :--------------- | :-----------: | :--------------: | :-----------------------: | :---------: | +| apple | 0.5482 | 0.5433 | 0.9362 | 0.7662 | +| bioclip | 0.6139 | 0.6600 | 0.9433 | 0.7933 | +| clip | 0.5547 | 0.5700 | 0.9362 | 0.7704 | +| plip | 0.5469 | 0.5267 | 0.9261 | 0.7630 | +| pubmed | 0.5482 | 0.5400 | 0.9278 | 0.7269 | +| siglip_large | 0.5286 | 0.5200 | 0.9496 | 0.7697 | +| siglip_small | 0.5449 | 0.4967 | 0.9327 | 0.7606 | +| tinyclip | 0.5651 | 0.5733 | 0.9280 | 0.7484 | +| vit-b-32-laion2b | 0.5684 | 0.5933 | 0.9302 | 0.7578 | + +--- + +
+ ## Datasets This repository contains classification datasets sourced from [Hugging Face](https://huggingface.co/datasets) and [Encord](https://app.encord.com/projects).