Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the XR-Transformer require exceptional RAM on Amazon-670k (and in other large datasets)? #223

Open
celsofranssa opened this issue May 19, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@celsofranssa
Copy link

Hello,

Reading the XR-Transformer paper, I would like to know the Multi-resolution learning time complexity considering the N(number of text instances) and the number of labels (L).
Is the Multi-resolution learning step that causes the right amount of RAM required to apply XR-Transformer over Amazon-670k?

@celsofranssa celsofranssa added the bug Something isn't working label May 19, 2023
@jiong-zhang
Copy link
Contributor

XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.

To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.

@celsofranssa
Copy link
Author

Thank you.

@celsofranssa
Copy link
Author

celsofranssa commented Dec 16, 2023

XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.

To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.

Could you provide the threshold that allows training XR-Transformer on the Amazon-670k dataset in a computational environment with 128GB RAM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants