Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change compareSubImg_thr_sse2 to use the intrinsic method _mm_sad_epu8 #99

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gabilan
Copy link
Contributor

@gabilan gabilan commented Nov 9, 2020

Change for how SAD (Sum of Absolute Differences) is computed in the SSE optimized path. There is a built-in intrinsic for computing SAD (PSADBW). By switching to the intrinsic, we see improvement in motion estimation/detection performance by 2x. All current tests pass -- maybe more tests are needed.

…8 for computing SAD. Improves motion estimation/detection performance by 2x.
@georgmartius
Copy link
Owner

Hi,
wow, this is interesting.
We really need to make sure this is correct. It looks fantastic if we can reduce the code to that small amount. Can you create more tests where the corner cases are tested, for instance different field sizes. I just want to make sure it really does the right thing.

@gabilan
Copy link
Contributor Author

gabilan commented Nov 9, 2020

I don't think it's necessary to explore field size testing because the field size is forced to be a multiple of 16. The code below is in the two places where field->size is determined. I am a bit concerned about test coverage -- but only because this method accounts for the majority of CPU time.

#if defined(USE_SSE2) || defined(USE_SSE2_ASM)
  fieldSize     = (fieldSize / 16 + 1) * 16;
  fieldSizeFine = (fieldSizeFine / 16 + 1) * 16;
#endif

@georgmartius
Copy link
Owner

Let me check the code myself. Give me a few days, I am very busy at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants