-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Effective drop path. #1836
Comments
@leng-yue I noticed that, but |
The ratio in dinov2's paper is 0.3 or 0.4. In our test, using their Drop Path implementation can make the training 15% faster when the ratio is 0.3. I can do a benchmark if you want. |
https://colab.research.google.com/drive/1ydeHogHNlGgVYCFBbgd4a5PYi9LZWHLH?usp=sharing As this benchmark shows, when dpr is 0.4, it can save 41% of the time when training. |
Any suggestion? |
@leng-yue sorry I've got quite a few other tasks to plow through so haven't had a chance to look more closely at this, I do want to test and weight the added complexity vs benefit before making final decisions |
Maybe adding it as a new |
@leng-yue yeah, I suppose a new Block would mitigate risk concerns for now, and also fix the breakage of other blocks that don't current support it. Can figure out how to make it easier to select later... |
I will implement it later. |
Updated. |
Is your feature request related to a problem? Please describe.
While current drop path implementation in TIMM doesn't save computation resources, implementing a true drop path that ignores unnecessary tokens will significantly speed up training when
path drop ratio
is high (e.g. 0.3 or 0.4).Describe the solution you'd like
Reference to: https://github.com/facebookresearch/dinov2/blob/c3c2683a13cde94d4d99f523cf4170384b00c34c/dinov2/layers/block.py#L110
I already implemented a modified Block that utilizes this function and it gives me a huge performance improvement. I can add it to PR #1835 if it's a good idea.
The text was updated successfully, but these errors were encountered: