-
Notifications
You must be signed in to change notification settings - Fork 15k
[RFC] Extend FASTCC to use up to 22 registers under APX #164638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
FASTCC calling convention is used for internal functions for better performance. X64 doesn't define a different calling convention for FASTCC. I think the reason is we have balanced caller/callee save registers defined in X64. With APX feature, we have 16 more caller save registers. They can all be used for argument passing. So we extend FASTCC to use up to 22 registers in argument passing. FASTCC is not support in X64 in the C code, so we don't need to worry about compatibility issue, see https://godbolt.org/z/fq99T6h63
|
@llvm/pr-subscribers-backend-x86 Author: Phoebe Wang (phoebewang) ChangesFASTCC calling convention is used for internal functions for better performance. X64 doesn't define a different calling convention for FASTCC. I think the reason is we have balanced caller/callee save registers defined in X64. With APX feature, we have 16 more caller save registers. They can all be used for argument passing. So we extend FASTCC to use up to 22 registers in argument passing. FASTCC is not support in X64 in the C code, so we don't need to worry about compatibility issue, see https://godbolt.org/z/fq99T6h63 Patch is 52.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164638.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index f020e0b55141c..bad0a698dd8de 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -687,6 +687,38 @@ def CC_X86_Win64_VectorCall : CallingConv<[
CCDelegateTo<CC_X86_Win64_C>
]>;
+def CC_X86_64_Fast : CallingConv<[
+ // Handles byval parameters. Note that we can't rely on the delegation
+ // to CC_X86_64_C for this because that happens after code that puts arguments
+ // in registers.
+ CCIfByVal<CCPassByVal<8, 8>>,
+
+ // Promote i1/i8/i16/v1i1 arguments to i32.
+ CCIfType<[i1, i8, i16, v1i1], CCPromoteToType<i32>>,
+
+ // Pointers are always passed in full 64-bit registers.
+ CCIfPtr<CCCustom<"CC_X86_64_Pointer">>,
+
+ // The first 22 integer arguments are passed in integer registers.
+ CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D, R16D, R17D,
+ R18D, R19D, R20D, R21D, R22D, R23D, R24D,
+ R25D, R26D, R27D, R28D, R29D, R30D, R31D]>>,
+
+ // i128 can be either passed in two i64 registers, or on the stack, but
+ // not split across register and stack. Handle this with a custom function.
+ CCIfType<[i64],
+ CCIfConsecutiveRegs<CCCustom<"CC_X86_64_I128">>>,
+
+ CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8, R9, R16, R17, R18,
+ R19, R20, R21, R22, R23, R24, R25, R26, R27,
+ R28, R29, R30, R31]>>,
+
+
+
+ // Otherwise, drop to normal X86-64 CC.
+ CCDelegateTo<CC_X86_64_C>
+]>;
+
def CC_X86_64_GHC : CallingConv<[
// Promote i8/i16/i32 arguments to i64.
@@ -1079,6 +1111,8 @@ def CC_X86_32 : CallingConv<[
// This is the root argument convention for the X86-64 backend.
def CC_X86_64 : CallingConv<[
+ CCIfCC<"CallingConv::Fast",
+ CCIfSubtarget<"hasEGPR()", CCDelegateTo<CC_X86_64_Fast>>>,
CCIfCC<"CallingConv::GHC", CCDelegateTo<CC_X86_64_GHC>>,
CCIfCC<"CallingConv::HiPE", CCDelegateTo<CC_X86_64_HiPE>>,
CCIfCC<"CallingConv::AnyReg", CCDelegateTo<CC_X86_64_AnyReg>>,
diff --git a/llvm/test/CodeGen/X86/apx/fastcc.ll b/llvm/test/CodeGen/X86/apx/fastcc.ll
new file mode 100644
index 0000000000000..984a4f640e379
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/fastcc.ll
@@ -0,0 +1,1374 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s --check-prefixes=CHECK,X64
+; RUN: llc < %s -mtriple=x86_64 -mattr=+egpr | FileCheck %s --check-prefixes=CHECK,EGPR
+
+define fastcc i8 @arg6_i8(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f) nounwind {
+; CHECK-LABEL: arg6_i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: # kill: def $r9d killed $r9d def $r9
+; CHECK-NEXT: # kill: def $r8d killed $r8d def $r8
+; CHECK-NEXT: # kill: def $ecx killed $ecx def $rcx
+; CHECK-NEXT: # kill: def $edx killed $edx def $rdx
+; CHECK-NEXT: # kill: def $esi killed $esi def $rsi
+; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
+; CHECK-NEXT: leal (%rdi,%rsi), %eax
+; CHECK-NEXT: addl %edx, %ecx
+; CHECK-NEXT: addb %al, %cl
+; CHECK-NEXT: leal (%r8,%r9), %eax
+; CHECK-NEXT: addb %cl, %al
+; CHECK-NEXT: # kill: def $al killed $al killed $eax
+; CHECK-NEXT: retq
+ %a1 = add i8 %a, %b
+ %a2 = add i8 %c, %d
+ %a3 = add i8 %e, %f
+ %b1 = add i8 %a1, %a2
+ %b2 = add i8 %b1, %a3
+ ret i8 %b2
+}
+
+define fastcc i16 @arg7_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g) nounwind {
+; X64-LABEL: arg7_i16:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: leal (%rdx,%rcx), %ecx
+; X64-NEXT: addl %edi, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: leal (%r8,%r9), %eax
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: # kill: def $ax killed $ax killed $eax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg7_i16:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r8,%r9), %eax
+; EGPR-NEXT: addl %r16d, %eax
+; EGPR-NEXT: addl %ecx, %eax
+; EGPR-NEXT: # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT: retq
+ %a1 = add i16 %a, %b
+ %a2 = add i16 %c, %d
+ %a3 = add i16 %e, %f
+ %b1 = add i16 %a1, %a2
+ %b2 = add i16 %a3, %g
+ %c1 = add i16 %b1, %b2
+ ret i16 %c1
+}
+
+define fastcc i32 @arg8_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) nounwind {
+; X64-LABEL: arg8_i32:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: addl %r8d, %eax
+; X64-NEXT: addl %r9d, %eax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg8_i32:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r16,%r17), %eax
+; EGPR-NEXT: addl %r8d, %eax
+; EGPR-NEXT: addl %r9d, %eax
+; EGPR-NEXT: addl %ecx, %eax
+; EGPR-NEXT: retq
+ %a1 = add i32 %a, %b
+ %a2 = add i32 %c, %d
+ %a3 = add i32 %e, %f
+ %a4 = add i32 %g, %h
+ %b1 = add i32 %a1, %a2
+ %b2 = add i32 %a3, %a4
+ %c1 = add i32 %b1, %b2
+ ret i32 %c1
+}
+
+define fastcc i64 @arg9_i64(i64 %a, i64 %b, i64 %c, i64 %d, i64 %e, i64 %f, i64 %g, i64 %h, i64 %i) nounwind {
+; X64-LABEL: arg9_i64:
+; X64: # %bb.0:
+; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT: addq %rdi, %rsi
+; X64-NEXT: addq %rdx, %rcx
+; X64-NEXT: addq %rsi, %rcx
+; X64-NEXT: addq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT: addq %r8, %rax
+; X64-NEXT: addq %r9, %rax
+; X64-NEXT: addq %rcx, %rax
+; X64-NEXT: addq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg9_i64:
+; EGPR: # %bb.0:
+; EGPR-NEXT: leaq (%rdx,%rcx), %rcx
+; EGPR-NEXT: addq %rdi, %rcx
+; EGPR-NEXT: addq %rsi, %rcx
+; EGPR-NEXT: leaq (%r16,%r17), %rax
+; EGPR-NEXT: addq %r8, %rax
+; EGPR-NEXT: addq %r9, %rax
+; EGPR-NEXT: addq %rcx, %rax
+; EGPR-NEXT: addq %r18, %rax
+; EGPR-NEXT: retq
+ %a1 = add i64 %a, %b
+ %a2 = add i64 %c, %d
+ %a3 = add i64 %e, %f
+ %a4 = add i64 %g, %h
+ %b1 = add i64 %a1, %a2
+ %b2 = add i64 %a3, %a4
+ %c1 = add i64 %b1, %b2
+ %c2 = add i64 %c1, %i
+ ret i64 %c2
+}
+
+define fastcc i32 @arg10_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j) nounwind {
+; X64-LABEL: arg10_i32:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: addl %r8d, %r10d
+; X64-NEXT: addl %r9d, %r10d
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: addl %r10d, %eax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg10_i32:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT: # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r16,%r17), %eax
+; EGPR-NEXT: addl %r8d, %eax
+; EGPR-NEXT: addl %r9d, %eax
+; EGPR-NEXT: addl %ecx, %eax
+; EGPR-NEXT: addl %r18d, %eax
+; EGPR-NEXT: addl %r19d, %eax
+; EGPR-NEXT: retq
+ %a1 = add i32 %a, %b
+ %a2 = add i32 %c, %d
+ %a3 = add i32 %e, %f
+ %a4 = add i32 %g, %h
+ %a5 = add i32 %i, %j
+ %b1 = add i32 %a1, %a2
+ %b2 = add i32 %a3, %a4
+ %c1 = add i32 %b1, %b2
+ %c2 = add i32 %c1, %a5
+ ret i32 %c2
+}
+
+define fastcc i16 @arg11_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g, i16 %h, i16 %i, i16 %j, i16 %k) nounwind {
+; X64-LABEL: arg11_i16:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movzwl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: movzwl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %r10w
+; X64-NEXT: addl %r8d, %r10d
+; X64-NEXT: addl %r9d, %r10d
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: addl %r10d, %eax
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT: # kill: def $ax killed $ax killed $eax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg11_i16:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT: # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r16,%r17), %edx
+; EGPR-NEXT: addl %r8d, %edx
+; EGPR-NEXT: addl %r9d, %edx
+; EGPR-NEXT: addl %ecx, %edx
+; EGPR-NEXT: leal (%r18,%r19), %eax
+; EGPR-NEXT: addl %r20d, %eax
+; EGPR-NEXT: addl %edx, %eax
+; EGPR-NEXT: # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT: retq
+ %a1 = add i16 %a, %b
+ %a2 = add i16 %c, %d
+ %a3 = add i16 %e, %f
+ %a4 = add i16 %g, %h
+ %a5 = add i16 %i, %j
+ %b1 = add i16 %a1, %a2
+ %b2 = add i16 %a3, %a4
+ %c1 = add i16 %b1, %b2
+ %c2 = add i16 %c1, %a5
+ %c3 = add i16 %c2, %k
+ ret i16 %c3
+}
+
+define fastcc i8 @arg12_i8(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f, i8 %g, i8 %h, i8 %i, i8 %j, i8 %k, i8 %l) nounwind {
+; X64-LABEL: arg12_i8:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movzbl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: movzbl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT: movzbl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addb %sil, %cl
+; X64-NEXT: leal (%r8,%r9), %edx
+; X64-NEXT: addb {{[0-9]+}}(%rsp), %r10b
+; X64-NEXT: addb %dl, %r10b
+; X64-NEXT: addb %cl, %r10b
+; X64-NEXT: addb {{[0-9]+}}(%rsp), %r11b
+; X64-NEXT: addb {{[0-9]+}}(%rsp), %al
+; X64-NEXT: addb %r11b, %al
+; X64-NEXT: addb %r10b, %al
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg12_i8:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT: # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT: # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT: # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdi,%rsi), %eax
+; EGPR-NEXT: addl %edx, %ecx
+; EGPR-NEXT: addb %al, %cl
+; EGPR-NEXT: leal (%r8,%r9), %eax
+; EGPR-NEXT: leal (%r16,%r17), %edx
+; EGPR-NEXT: addb %al, %dl
+; EGPR-NEXT: addb %cl, %dl
+; EGPR-NEXT: leal (%r18,%r19), %ecx
+; EGPR-NEXT: leal (%r20,%r21), %eax
+; EGPR-NEXT: addb %cl, %al
+; EGPR-NEXT: addb %dl, %al
+; EGPR-NEXT: # kill: def $al killed $al killed $eax
+; EGPR-NEXT: retq
+ %a1 = add i8 %a, %b
+ %a2 = add i8 %c, %d
+ %a3 = add i8 %e, %f
+ %a4 = add i8 %g, %h
+ %a5 = add i8 %i, %j
+ %a6 = add i8 %k, %l
+ %b1 = add i8 %a1, %a2
+ %b2 = add i8 %a3, %a4
+ %b3 = add i8 %a5, %a6
+ %c1 = add i8 %b1, %b2
+ %c2 = add i8 %c1, %b3
+ ret i8 %c2
+}
+
+define fastcc i16 @arg13_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g, i16 %h, i16 %i, i16 %j, i16 %k, i16 %l, i16 %m) nounwind {
+; X64-LABEL: arg13_i16:
+; X64: # %bb.0:
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movzwl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: movzwl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: movzwl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %r11w
+; X64-NEXT: addl %r8d, %r11d
+; X64-NEXT: addl %r9d, %r11d
+; X64-NEXT: addl %ecx, %r11d
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %r10w
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT: addl %r10d, %eax
+; X64-NEXT: addl %r11d, %eax
+; X64-NEXT: addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT: # kill: def $ax killed $ax killed $eax
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg13_i16:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT: # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT: # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT: # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r16,%r17), %edx
+; EGPR-NEXT: addl %r8d, %edx
+; EGPR-NEXT: addl %r9d, %edx
+; EGPR-NEXT: addl %ecx, %edx
+; EGPR-NEXT: leal (%r20,%r21), %eax
+; EGPR-NEXT: addl %r18d, %eax
+; EGPR-NEXT: addl %r19d, %eax
+; EGPR-NEXT: addl %r22d, %eax
+; EGPR-NEXT: addl %edx, %eax
+; EGPR-NEXT: # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT: retq
+ %a1 = add i16 %a, %b
+ %a2 = add i16 %c, %d
+ %a3 = add i16 %e, %f
+ %a4 = add i16 %g, %h
+ %a5 = add i16 %i, %j
+ %a6 = add i16 %k, %l
+ %b1 = add i16 %a1, %a2
+ %b2 = add i16 %a3, %a4
+ %b3 = add i16 %a5, %a6
+ %c1 = add i16 %b1, %b2
+ %c2 = add i16 %c1, %b3
+ %c3 = add i16 %c2, %m
+ ret i16 %c3
+}
+
+define fastcc i32 @arg14_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n) nounwind {
+; X64-LABEL: arg14_i32:
+; X64: # %bb.0:
+; X64-NEXT: pushq %rbx
+; X64-NEXT: # kill: def $r9d killed $r9d def $r9
+; X64-NEXT: # kill: def $r8d killed $r8d def $r8
+; X64-NEXT: # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT: # kill: def $edx killed $edx def $rdx
+; X64-NEXT: # kill: def $esi killed $esi def $rsi
+; X64-NEXT: # kill: def $edi killed $edi def $rdi
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT: movl {{[0-9]+}}(%rsp), %ebx
+; X64-NEXT: addl %edi, %esi
+; X64-NEXT: addl %edx, %ecx
+; X64-NEXT: addl %esi, %ecx
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %ebx
+; X64-NEXT: addl %r8d, %ebx
+; X64-NEXT: addl %r9d, %ebx
+; X64-NEXT: addl %ecx, %ebx
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT: addl %r11d, %r10d
+; X64-NEXT: addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT: addl %r10d, %eax
+; X64-NEXT: addl %ebx, %eax
+; X64-NEXT: popq %rbx
+; X64-NEXT: retq
+;
+; EGPR-LABEL: arg14_i32:
+; EGPR: # %bb.0:
+; EGPR-NEXT: # kill: def $r23d killed $r23d def $r23
+; EGPR-NEXT: # kill: def $r22d killed $r22d def $r22
+; EGPR-NEXT: # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT: # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT: # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT: # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT: # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT: # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT: # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT: # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT: # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT: # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT: # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT: # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT: leal (%rdx,%rcx), %ecx
+; EGPR-NEXT: addl %edi, %ecx
+; EGPR-NEXT: addl %esi, %ecx
+; EGPR-NEXT: leal (%r16,%r17), %edx
+; EGPR-NEXT: addl %r8d, %edx
+; EGPR-NEXT: addl %r9d, %edx
+; EGPR-NEXT: addl %ecx, %edx
+; EGPR-NEXT: leal (%r20,%r21), %eax
+; EGPR-NEXT: addl %r18d, %eax
+; EGPR-NEXT: addl %r19d, %eax
+; EGPR-NEXT: addl %r22d, %eax
+; EGPR-NEXT: addl %r23d, %eax
+; EGPR-NEXT: addl %edx, %eax
+; EGPR-NEXT: retq
+ %a1 = add i32 %a, %b
+ %a2 = add i32 %c, %d
+ %a3 = add i32 %e, %f
+ %a4 = add i32 %g, %h
+ %a5 = add i32 %i, %j
+ %a6 = add i32 %k, %l
+ %a7 = add i32 %m, %n
+ %b1 = add i32 %a1, %a2
+ %b2 = add i32 %a3, %a4
+ %b3 = add i32 %a5, %a6
+ %c1 = add i32 %b1, %b2
+ %c2 = add i32 %c1, %b3
+ %c3 = add i32 %c2, %a7
+ ret i32 %c3
+}
+
+define fastcc i64 @arg15_i64(i64 %a, i64 %b, ...
[truncated]
|
|
in #76868 We had the discussion KanRobert : I won't expect this from the name preserve_none. R16-R31 should be used to pas arguments when they're available. jkynight: They should not be used to pass arguments conditionally based on the current subtarget, since that creates two incompatible calling conventions. There's no reason "preserve none" has to be read to imply "uses all possible registers to pass arguments.", so I don't see an issue with leaving it like it is. I doubt if we have the same issue here? |
|
The caller and callee both need to have EGPR enabled for this to work. Right? |
I believe so, both the caller and callee need to know where the argument is. |
|
I think the proposal here is different with preserve_none and other dedicated calling conventions. This patch only applies to the cases of internal function optimizations: https://godbolt.org/z/3xT94Wvjv. So we don't have a subtarget different issue. In the example of https://godbolt.org/z/fq99T6h63, I showed Clang already warns for the use of fastcc in C side. We may consider to relex it to allowing used with APX only. So we don't have a subtarget different issue even we want to expend it. |
|
Good point! I think it is easy to solve. We can add a TTI interface like |
I think you're mixing up two orthogonal things here, IIRC fastcall is But that doesn't make your change unneeded, just my 2 cents |
I see, thanks! There's nothing to do with the C |
Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR llvm#164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature. The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of llvm#164638 with this patch.
FASTCC calling convention is used for internal functions for better performance. X64 doesn't define a different calling convention for FASTCC. I think the reason is we have balanced caller/callee save registers defined in X64.
With APX feature, we have 16 more caller save registers. They can all be used for argument passing. So we extend FASTCC to use up to 22 registers in argument passing.