Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast_matrix_mul_4x4 function with SIMD optimization for the LoongArch64. #19959

Merged
merged 1 commit into from
Feb 10, 2025

Conversation

KatyushaScarlet
Copy link
Contributor

@KatyushaScarlet KatyushaScarlet commented Feb 8, 2025

  1. Add the function fast_matrix_mul_4x4_lsx for LoongArch64.
  2. Add the CFLAGS -mlsx and -mlasx for LoongArch64, which enable gcc to build with LSX/LASX (128/256bit SIMD extension for loongson CPU) instructions.

Here is the Unofficial LoongArch Intrinsics Guide: https://jia.je/unofficial-loongarch-intrinsics-guide/migrating_sse/

Copy link
Owner

@hrydgard hrydgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally CrossSIMD.h needs a Loongarch impl too, in the future some more of the SIMD code of the app will be migrated to use that.


static __m128 __lsx_vreplfr2vr_s(float val)
{
FloatInt tmpval = {.f = val};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays it's more standard to use memcpy for this, compilers optimize it down properly. This usage of unions is UB. However, in practice it does work, so I'll allow it.

@hrydgard hrydgard merged commit 6ec74f5 into hrydgard:master Feb 10, 2025
19 checks passed
@hrydgard hrydgard added this to the v1.19.0 milestone Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants