Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSE2NEON for ARM (Nvidia Tegra, PI, etc) support #59

Merged
merged 2 commits into from
Jun 9, 2017

Conversation

combatpoodle
Copy link
Contributor

What

Adds support for ARM processors. In my case, this means a quadcopter self-navigating for SAR. There are a number of possible uses, but in general PI and Nvidia Tegra's seem to run the small robotics world at the moment.

How

Including SSE2NEON.h only when SSE is not present keeps things nicely segregated - SSE2NEON takes care of all the translations, and any required compatibility fixes would belong to the upstream. There are other options such as simde; however they didn't work on the very first try :)

Related issues

#57 and #31

Changes

I've tried to keep this as loosely-coupled as possible. Ping me if you'd like any changes, such as just including SSE2NEON.h in the utils folder instead of using the submodule.

Thanks! Israel

@JakobEngel JakobEngel merged commit e23058b into JakobEngel:master Jun 9, 2017
@JakobEngel
Copy link
Owner

thanks!
I don't have a way of testing it on ARM, however given the comments it seems to work well!
I changed the ifdef checks to
#if !defined(SSE3) && !defined(SSE2) && !defined(SSE1)
(I only have SSE2 on my workstation, i.e. SSE3 is not defined and compilation failed).

Best,
Jakob

@combatpoodle
Copy link
Contributor Author

Awesome, thanks! I'll run some tests tomorrow but I can't imagine anything would be broken - I was just using SSE3 based off of some intel docs.

@combatpoodle combatpoodle deleted the feature/sse2neon branch June 11, 2017 05:21
@guohengkai
Copy link

@israelshirk Hi, it seems that in the newest NDK the codes fail to compile. The errors were:
AccumulatedTopHessian.s: Assembler messages:
AccumulatedTopHessian.s:28297: Error: r13 not allowed here -- `sub.w sp,r2,#80'
clang++: error: assembler command failed with exit code 1 (use -v to see invocation)

Do you have any idea?

@combatpoodle
Copy link
Contributor Author

@guohengkai Could you create a new issue for this? Thanks!

@guohengkai
Copy link

@israelshirk Done: #96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants