Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use FP8 in large multi-node settings for BERT? #8

Open
soonjune opened this issue Feb 16, 2024 · 0 comments
Open

Why not use FP8 in large multi-node settings for BERT? #8

soonjune opened this issue Feb 16, 2024 · 0 comments

Comments

@soonjune
Copy link

I've noticed that there is --use_transformer_engine2 flag disabled for multi-node training greater than 8 in the configurations. I've also noticed that it is also slower when I enable transformer engine in this case. Can anyone point out why FP8 training is slower in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant