Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

vivekh2000 · 2024-07-25T16:34:34Z

Since in your code, the distillation_token and distill_mlp heads are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU does not send the distillation_token and distill_mlp head to GPU. Therefore, while training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the distillation_token and distill_mlp turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class, which is a wrapper of loss function. Therefore, I have suggested the following changes when training a model on GPU: the training code should set the device="cude" if torch.cuda.is_available() else "cpu", or the same can be incorporated into the constructor of the DistillWrapper class.

…ead and `distillation_token` Since in your code, `distillation_token` and `distill_mlp` head are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU. do not send them to GPU. While training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the `distillation_token` and `distill_mlp` turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class. Therefore, I have suggested the following changes, when training a model on GPU, the training code should set the device="cude" if torch.cuda.is_available() else "cpu". or the same can be incorporated in the constructor of the DistillWrapper class.

lucidrains force-pushed the main branch 3 times, most recently from 19eb6d4 to 5e808f4 Compare August 21, 2024 14:23

lucidrains force-pushed the main branch from 43cbcad to f50d7d1 Compare October 9, 2024 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

vivekh2000 commented Jul 25, 2024 •

edited

Loading

Update distill.py to include device agnostic code for distill_mlp head and distillation_token #324

Are you sure you want to change the base?

Update distill.py to include device agnostic code for distill_mlp head and distillation_token #324

Conversation

vivekh2000 commented Jul 25, 2024 • edited Loading

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` #324

vivekh2000 commented Jul 25, 2024 •

edited

Loading