-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loongarch: Implement ll/sc support for atomic operations #254
Conversation
case SLJIT_MOV_P: | ||
ins = SC_D; | ||
break; | ||
case SLJIT_MOV_S32: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SLJIT_MOV_U32?
signed types should return SLJIT_ERR_UNSUPPORTED AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISC-V and MIPS all use almost the same ll/sc for atomic load/store as LoongArch, and they all support signed types, so I think it wouldn't be an issue. BTW, I have noticed that in sljit_src/sljitNativeMIPS_common.c line 4234 has a SLJIT_CONFIG_RISCV_64 comment, should be a typo. @zherczeg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signed types should return SLJIT_ERR_UNSUPPORTED AFAIK
I stand corrected, apparently the "spec" (see comments in sljitLir.h) was changed with 8bcd711, so guess that the ones still rejecting signed types might be the ones that are not up to it.
Eitherway you are still missing SLJIT_MOV_U32 there, right?, this work with 6d44b46 on top works fine in an emulated 3A5000 (without AMCAS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to use signed 32-bit ll/sc for 32-bit unsigned atomic load/store directly? I'm worry about boundary cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely not but is better than leaving it out as unsupported, and indeed I was surprised when it didn't break (as it did in a previous attempts to implement this as shown by #172).
which is why I was curious about the bug fixed, since it seems obvious that our test coverage needs some serious overhaul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the reason why is that ll.d/sc.w is differ from ll.w/sc.w, but no matter what I don't think add unsigned 32-bit is a good solution, it was likely broken at the time we enhanced the test suite. Furthurmore, if the ll.w/sc.w could work well on loongarch, it should alse benifit mips & riscv, in that case let's add that in a seperate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate on the nature of the bugs that were fixed?
Originally, SLJIT_MOV_S32 was missing so that when it was used, amcas will fall to amcas.d and leading to an incorrect result. |
Now the atomic instruction can tell the user whether it supports signed or unsigned operations (or both, but that is unlikely). Users can use sign/zero extension if they need it. MOV32 has no sign, it is up to the cpu, if registers are more than 32 bit long. riscv in mips is definitely a typo. |
Understand, I'll fix that typo, besides, are there still any problem for this patch? |
case SLJIT_HAS_CLZ: | ||
case SLJIT_HAS_CTZ: | ||
case SLJIT_HAS_REV: | ||
case SLJIT_HAS_ROT: | ||
case SLJIT_HAS_PREFETCH: | ||
case SLJIT_HAS_COPY_F32: | ||
case SLJIT_HAS_COPY_F64: | ||
case SLJIT_HAS_ATOMIC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I see it correctly, that the cpu has new LL/SC form but CAS form is also supported? This is true for ARM64, so I am thinking whether we should allow users to pick which form they prefer. E.g. add a flag to pick the type, then return with unsupported if not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoongArch originally only support LL/SC, but add CAS support on LoongArch V1.10 (Currently only 3A6000 support this extension, QEMU do not support it yet). So it's safe to always use LL/SC, but CAS give more fine-grained atomic operations support.
To allow user's selection, maybe we should add a type param to sljit_emit_atomic_load
or sljit_emit_atomic_store
, or we could add a SLJIT_HAS_CAS
flag for it? But I still think there is not too much need to offer choices. Unless CAS is slower that LL/SC on some specific platform.
Reference: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#new-in-loongarch-v1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: AMCAS is faster than LL/SC on LoongArch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will think about it.
@@ -2934,7 +2932,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_flags(struct sljit_compiler *co | |||
break; | |||
case SLJIT_ATOMIC_STORED: | |||
case SLJIT_ATOMIC_NOT_STORED: | |||
FAIL_IF(push_inst(compiler, SLTUI | RD(dst_r) | RJ(EQUAL_FLAG) | IMM_I12(1))); | |||
FAIL_IF(push_inst(compiler, SLTUI | RD(dst_r) | RJ(OTHER_FLAG) | IMM_I12(1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I changed it for convenience, you don't need to change it if using the equal flag is easier for the target cpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way should be okay, but I think it's better to stay the same with other RISC architectures for maintainability.
switch (GET_OPCODE(op)) { | ||
case SLJIT_MOV: | ||
case SLJIT_MOV_P: | ||
ins = LL_D; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the current status, but I suspect LL_D is not available for 32 bit systems. Do we only support 64 bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By now, almost every LoongArch chip are 64-bit, AFAIK, only 1c series mcu are 32-bit, so I think it's safe to support 64-bit only. If there were 32-bit chips come into market, we could follow the way RISC-V did, split this into sljitNativeLOONGARCH_common.c and so-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -3684,8 +3745,8 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_atomic_store(struct sljit_compiler | |||
if (unsign) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsign support can be removed in the future.
Implement ll/sc for loongarch atomic load & store, also fix amcas bugs.
Tested under 3C5000 and 3A6000 (with amcas support).