Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the SAM2 implementation need any code modifications to handle different input resolutions? #10

Open
GoldenFishes opened this issue Dec 2, 2024 · 1 comment

Comments

@GoldenFishes
Copy link

Thank you so much for your great work! I would like to know if I need to modify the SAM2 code to handle different input resolutions. Which parts of the code should I modify? Also, what is the actual speed difference for different resolutions?

@heyoeyo
Copy link
Owner

heyoeyo commented Dec 2, 2024

Thanks for checking out the repo!

For the original SAM2 code, the video predictor already supports resolution changes without modifying the code, you just need to change the image_size config parameter. The image predictor does require some (small) changes to make the bb_feat_sizes parameter dynamic. There's more of a description in issue #257.

There's about a 4x speed up going from 1024 down to 512px. Unfortunately, the SAM v2 models don't handle the resolution change very well, so doing intermediate resolutions (e.g 768px) to get better speed/accuracy tradeoff usually isn't an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants