Same attention score and the pre-trained aggregators. #59

HHHedo · 2022-10-10T12:26:52Z

Dear bin,
Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?
When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help!
Best,
Tiancheng Lin

The text was updated successfully, but these errors were encountered:

HHHedo · 2022-10-11T11:27:47Z

Hi bin,
I solve the problem of same attention score by removing the dimension normalization, and the performance is comparable. However, I am still confused about the pre-trained models and pre-computed features.

binli123 · 2022-10-11T16:55:46Z

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

HHHedo · 2022-10-12T07:25:43Z

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

HHHedo · 2022-10-13T16:01:59Z

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it.
Thank you very much!

xiaozhu0816 · 2022-11-03T12:48:19Z

Dear bin, Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?

When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 .
I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction.

Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga
Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default
For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.

` ...` and after the 3rd epoch, there is no other better model to be solved. That's very confused me.

Could you tell me why and how to fix it? Thank you very much.

binli123 · 2022-11-30T16:28:55Z

Dear bin, Thank you for your great work!

When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16，2）Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?

When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 . I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction.

Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga

Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default
For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.

... and after the 3rd epoch, there is no other better model to be solved. That's very confused me.

Could you tell me why and how to fix it? Thank you very much.

My experience is that the modal sometimes converges very fast on the TCGA dataset. I also found initialization matters.

binli123 · 2023-04-06T13:37:37Z

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it. Thank you very much!

The settings are the default values. I discovered that sometimes it does not converge fast, and sometimes it does. This is especially the case when the positive samples are few in a positive bag. But with some standard weight initialization methods proposed for faster convergence, you could possibly get a faster-converging rate.

xiaozhu0816 mentioned this issue Nov 3, 2022

Strange score for TCGA #61

Closed

binli123 mentioned this issue Apr 6, 2023

Problem of reproduce Camelyon16 result #54

Open

HHHedo closed this as completed Apr 6, 2023

HHHedo mentioned this issue Apr 9, 2024

failed to reproduce the results under (ResNet-18, DSMIL, Camelyon16) HHHedo/IBMIL#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same attention score and the pre-trained aggregators. #59

Same attention score and the pre-trained aggregators. #59

HHHedo commented Oct 10, 2022

HHHedo commented Oct 11, 2022

binli123 commented Oct 11, 2022

HHHedo commented Oct 12, 2022

HHHedo commented Oct 13, 2022 •

edited

Loading

xiaozhu0816 commented Nov 3, 2022

binli123 commented Nov 30, 2022

binli123 commented Apr 6, 2023

Same attention score and the pre-trained aggregators. #59

Same attention score and the pre-trained aggregators. #59

Comments

HHHedo commented Oct 10, 2022

HHHedo commented Oct 11, 2022

binli123 commented Oct 11, 2022

HHHedo commented Oct 12, 2022

HHHedo commented Oct 13, 2022 • edited Loading

xiaozhu0816 commented Nov 3, 2022

binli123 commented Nov 30, 2022

binli123 commented Apr 6, 2023

HHHedo commented Oct 13, 2022 •

edited

Loading