Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same attention score and the pre-trained aggregators. #59

Closed
HHHedo opened this issue Oct 10, 2022 · 7 comments
Closed

Same attention score and the pre-trained aggregators. #59

HHHedo opened this issue Oct 10, 2022 · 7 comments

Comments

@HHHedo
Copy link

HHHedo commented Oct 10, 2022

Dear bin,
Thank you for your great work!

  • When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16,2)Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?

  • When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help!
Best,
Tiancheng Lin

@HHHedo
Copy link
Author

HHHedo commented Oct 11, 2022

Hi bin,
I solve the problem of same attention score by removing the dimension normalization, and the performance is comparable. However, I am still confused about the pre-trained models and pre-computed features.

@binli123
Copy link
Owner

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

@HHHedo
Copy link
Author

HHHedo commented Oct 12, 2022

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

@HHHedo
Copy link
Author

HHHedo commented Oct 13, 2022

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it.
Thank you very much!

@xiaozhu0816
Copy link

Dear bin, Thank you for your great work!

  • When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16,2)Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?
  • When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 .
I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction.

  1. Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga
  2. Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default
    For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.

屏幕快照 2022-11-03 下午8 40 23

` ...` and after the 3rd epoch, there is no other better model to be solved. That's very confused me.

屏幕快照 2022-11-03 下午8 40 54

Could you tell me why and how to fix it? Thank you very much.

@binli123
Copy link
Owner

Dear bin, Thank you for your great work!

  • When I reproduce the results on c-16 and TCGA, I follow the provided readme: 1) Using pre-computed features from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16,2)Training the model (with all hyperparameters as default) python train_tcga.py --dataset=TCGA-lung-default/python train_tcga.py --dataset=Camelyon16 --num_classes=1. For c16, I found there is mild degradation in accuracy of 91% unlike Problem of reproduce Camelyon16 result #54 with only 60%. But I did find each patch will produce the same attention score as Problem of reproduce Camelyon16 result #54. For TCGA, the same attention score can also be found but with quite promising results (e.g., train loss: 0.3307 test loss: 0.3239, average score: 0.9000, AUC: class-0>>0.9715089374829871|class-1>>0.9658833136738953). The problem of the same attention score on c16 may sometimes be solved by restarting the training with the init.pth loaded, but never solved on TCGA. How to do with it?
  • When I use the provided pre-trained aggregator (.test/weights/aggregator.pth or .test-c16/weights/aggregator.pth) to the test set of pre-computed feature from Download feature vectors for MIL network --> python download.py --dataset=tcga/c16. I got reasonable results (average score: 0.9125, AUC: class-0>>0.9546666666666667) on c-16, but unreasonable ones (average score: 0.6857, AUC: class-0>>0.8621722166772525|class-1>>0.8949278649850286) on TCGA. I wonder whether these pre-trained aggregators can only work with the provided embedder (test/weights/embedder.pth or .test-c16/weights/embedder.pth) instead of pre-computed features? In other words, the pre-computed features are not generated by these pre-trained embedders?

Looking forward to your help! Best, Tiancheng Lin

Hi, @HHHedo & @binli123 . I have the same question with @HHHedo. I focus on the TCGA part now, and followed the instruction.

  1. Using pre-computed features fromDownload feature vectors for MIL network --> $ python download.py --dataset=tcga
  2. Training the model (with all hyperparameters as default) $ python train_tcga.py --dataset=TCGA-lung-default
    For TCGA, I got the same attention score with @HHHedo , I don't know why at the first epoch, the score is so high. You can see my screenshots.
屏幕快照 2022-11-03 下午8 40 23

... and after the 3rd epoch, there is no other better model to be solved. That's very confused me.
屏幕快照 2022-11-03 下午8 40 54

Could you tell me why and how to fix it? Thank you very much.

My experience is that the modal sometimes converges very fast on the TCGA dataset. I also found initialization matters.

@binli123
Copy link
Owner

binli123 commented Apr 6, 2023

Hi, please make sure that the weights are indeed fully loaded into your model without mismatch; you can set strict=True in torch.load(). There are multiple embedder.pth files available, and the downloaded features were computed using one of them (possibly not the same one included in the download because I updated them once afterward). But you can always use that embedder to recompute new features and then test with that aggregator. You can find all embedders I trained for the two data sets in Camelyon16 and TCGA

Hi, thank you for your quick help! Could you release more aggregators?

One more question about 'init.pth': As mentioned in #26 , it is trained with a few interactions on the Camelyon16 dataset following the original training/testing split. I would appreciate if you could share your detailed settings used for it. Thank you very much!

The settings are the default values. I discovered that sometimes it does not converge fast, and sometimes it does. This is especially the case when the positive samples are few in a positive bag. But with some standard weight initialization methods proposed for faster convergence, you could possibly get a faster-converging rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants