Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Apple M1 and other arm64+Neon architectures #1

Merged
merged 1 commit into from
May 19, 2022

Conversation

cskiraly
Copy link

@cskiraly cskiraly commented May 17, 2022

There was already some NEON support, through a separate code
path. This version relies on the sse2neon library to add
NEON support directly without a separate code path.

Signed-off-by: Csaba Kiraly [email protected]

There was already some NEON support, through a separate code
path. This version relies on the sse2neon library to add
Neon support

Signed-off-by: Csaba Kiraly <[email protected]>
@cskiraly
Copy link
Author

Just linking original issue upstream to help those looking at it.
catid#18

@cskiraly
Copy link
Author

Tested on:

  • Apple M1 MacOS:
Parameters: [original count=1000] [recovery count=100] [buffer bytes=64000] [loss count=100] [random seed=2]
Leopard Encoder(64 MB in 1000 pieces, 100 losses): Input=3136.18 MB/s, Output=313.618 MB/s
Leopard Decoder(64 MB in 1000 pieces, 100 losses): Input=708.302 MB/s, Output=70.8302 MB/s
  • Apple M1 under Parallels Linux arm64 VM (4 cores, openMP):
Parameters: [original count=1000] [recovery count=100] [buffer bytes=64000] [loss count=100] [random seed=2]
Leopard Encoder(64 MB in 1000 pieces, 100 losses): Input=5281.4 MB/s, Output=528.14 MB/s
Leopard Decoder(64 MB in 1000 pieces, 100 losses): Input=901.332 MB/s, Output=90.1332 MB/s
  • Raspberry Pi 3:
Parameters: [original count=1000] [recovery count=100] [buffer bytes=64000] [loss count=100] [random seed=2]
Leopard Encoder(64 MB in 1000 pieces, 100 losses): Input=136.62 MB/s, Output=13.662 MB/s
Leopard Decoder(64 MB in 1000 pieces, 100 losses): Input=27.0177 MB/s, Output=2.70177 MB/s
  • RK3328 (Rock64 SBC, 4 x Cortex-A53):
Parameters: [original count=1000] [recovery count=100] [buffer bytes=64000] [loss count=100] [random seed=2]
Leopard Encoder(64 MB in 1000 pieces, 100 losses): Input=202.632 MB/s, Output=20.2632 MB/s
Leopard Decoder(64 MB in 1000 pieces, 100 losses): Input=34.2045 MB/s, Output=3.42045 MB/s

@cskiraly cskiraly changed the title add support for Mac M1, maybe other arm64+neon as well add support for Apple M1 and other arm64+Neon architectures May 18, 2022
@cskiraly cskiraly requested a review from a team May 18, 2022 12:53
LeopardCommon.h Show resolved Hide resolved
@liamsi
Copy link

liamsi commented Jun 2, 2022

@cskiraly when you tested this on an Apple M1 under MacOS, did you just run cmake and then make using the generated Makefile? When I do this on my machine (m1 max with Monterey), I run into Undefined symbols for architecture arm64 🤔

@cskiraly
Copy link
Author

cskiraly commented Jun 4, 2022

@liamsi I was compiling it through our wrapper in https://github.com/status-im/nim-leopard
which disables openMP. It actually does
cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_OPENMP=off

-DENABLE_OPENMP=off should work for you.

The undefined symbols I see when enabling openMP are omp symbols. Either there is something wrong with the linker args, or the omp installed by brew is not arm64, I have yet to check. We do have some related discussion at https://github.com/status-im/nim-leopard#openmp

@liamsi
Copy link

liamsi commented Jun 7, 2022

Thanks, -DENABLE_OPENMP=off does the trick. I made sure to use llvm/clang installed via brew but I still do see errors (this time related to std::). I'll continue playing around with different flags and see if I can get it compile with all optimizations. Will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants