-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about distill_loss #289
Comments
@haoren55555 yeah, you could do log softmax for teacher too by setting |
@lucidrains
This fusion of two heads is not implemented in the forward method of the DistillMixin class. Please let me know if it is implemented elsewhere or if I am missing something. Thanks for the excellent codebase. |
sorry to bother, I see the distill_loss in distill.py as :
distill_loss = F.kl_div(
F.log_softmax(distill_logits / T, dim=-1),
F.softmax(teacher_logits / T, dim=-1).detach(),
reduction='batchmean')
I wonder why the teacher part uses the softmax function rather than log_softmax one, thanks.
The text was updated successfully, but these errors were encountered: