Skip to content

Conversation

@phoebewang
Copy link
Contributor

@phoebewang phoebewang commented Oct 22, 2025

FASTCC calling convention is used for internal functions for better performance. X64 doesn't define a different calling convention for FASTCC. I think the reason is we have balanced caller/callee save registers defined in X64.

With APX feature, we have 16 more caller save registers. They can all be used for argument passing. So we extend FASTCC to use up to 22 registers in argument passing.

FASTCC calling convention is used for internal functions for better
performance. X64 doesn't define a different calling convention for
FASTCC. I think the reason is we have balanced caller/callee save
registers defined in X64.

With APX feature, we have 16 more caller save registers. They can all be
used for argument passing. So we extend FASTCC to use up to 22 registers
in argument passing.

FASTCC is not support in X64 in the C code, so we don't need to worry
about compatibility issue, see https://godbolt.org/z/fq99T6h63
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

FASTCC calling convention is used for internal functions for better performance. X64 doesn't define a different calling convention for FASTCC. I think the reason is we have balanced caller/callee save registers defined in X64.

With APX feature, we have 16 more caller save registers. They can all be used for argument passing. So we extend FASTCC to use up to 22 registers in argument passing.

FASTCC is not support in X64 in the C code, so we don't need to worry about compatibility issue, see https://godbolt.org/z/fq99T6h63


Patch is 52.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164638.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86CallingConv.td (+34)
  • (added) llvm/test/CodeGen/X86/apx/fastcc.ll (+1374)
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index f020e0b55141c..bad0a698dd8de 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -687,6 +687,38 @@ def CC_X86_Win64_VectorCall : CallingConv<[
   CCDelegateTo<CC_X86_Win64_C>
 ]>;
 
+def CC_X86_64_Fast : CallingConv<[
+  // Handles byval parameters.  Note that we can't rely on the delegation
+  // to CC_X86_64_C for this because that happens after code that puts arguments
+  // in registers.
+  CCIfByVal<CCPassByVal<8, 8>>,
+
+  // Promote i1/i8/i16/v1i1 arguments to i32.
+  CCIfType<[i1, i8, i16, v1i1], CCPromoteToType<i32>>,
+
+  // Pointers are always passed in full 64-bit registers.
+  CCIfPtr<CCCustom<"CC_X86_64_Pointer">>,
+
+  // The first 22 integer arguments are passed in integer registers.
+  CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D, R16D, R17D,
+                                 R18D, R19D, R20D, R21D, R22D, R23D, R24D,
+                                 R25D, R26D, R27D, R28D, R29D, R30D, R31D]>>,
+
+  // i128 can be either passed in two i64 registers, or on the stack, but
+  // not split across register and stack. Handle this with a custom function.
+  CCIfType<[i64],
+           CCIfConsecutiveRegs<CCCustom<"CC_X86_64_I128">>>,
+
+  CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8, R9, R16, R17, R18,
+                                 R19, R20, R21, R22, R23, R24, R25, R26, R27,
+                                 R28, R29, R30, R31]>>,
+
+
+
+  // Otherwise, drop to normal X86-64 CC.
+  CCDelegateTo<CC_X86_64_C>
+]>;
+
 
 def CC_X86_64_GHC : CallingConv<[
   // Promote i8/i16/i32 arguments to i64.
@@ -1079,6 +1111,8 @@ def CC_X86_32 : CallingConv<[
 
 // This is the root argument convention for the X86-64 backend.
 def CC_X86_64 : CallingConv<[
+  CCIfCC<"CallingConv::Fast",
+    CCIfSubtarget<"hasEGPR()", CCDelegateTo<CC_X86_64_Fast>>>,
   CCIfCC<"CallingConv::GHC", CCDelegateTo<CC_X86_64_GHC>>,
   CCIfCC<"CallingConv::HiPE", CCDelegateTo<CC_X86_64_HiPE>>,
   CCIfCC<"CallingConv::AnyReg", CCDelegateTo<CC_X86_64_AnyReg>>,
diff --git a/llvm/test/CodeGen/X86/apx/fastcc.ll b/llvm/test/CodeGen/X86/apx/fastcc.ll
new file mode 100644
index 0000000000000..984a4f640e379
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/fastcc.ll
@@ -0,0 +1,1374 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s --check-prefixes=CHECK,X64
+; RUN: llc < %s -mtriple=x86_64 -mattr=+egpr | FileCheck %s --check-prefixes=CHECK,EGPR
+
+define fastcc i8 @arg6_i8(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f) nounwind {
+; CHECK-LABEL: arg6_i8:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    # kill: def $r9d killed $r9d def $r9
+; CHECK-NEXT:    # kill: def $r8d killed $r8d def $r8
+; CHECK-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; CHECK-NEXT:    # kill: def $edx killed $edx def $rdx
+; CHECK-NEXT:    # kill: def $esi killed $esi def $rsi
+; CHECK-NEXT:    # kill: def $edi killed $edi def $rdi
+; CHECK-NEXT:    leal (%rdi,%rsi), %eax
+; CHECK-NEXT:    addl %edx, %ecx
+; CHECK-NEXT:    addb %al, %cl
+; CHECK-NEXT:    leal (%r8,%r9), %eax
+; CHECK-NEXT:    addb %cl, %al
+; CHECK-NEXT:    # kill: def $al killed $al killed $eax
+; CHECK-NEXT:    retq
+  %a1 = add i8 %a, %b
+  %a2 = add i8 %c, %d
+  %a3 = add i8 %e, %f
+  %b1 = add i8 %a1, %a2
+  %b2 = add i8 %b1, %a3
+  ret i8 %b2
+}
+
+define fastcc i16 @arg7_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g) nounwind {
+; X64-LABEL: arg7_i16:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    leal (%rdx,%rcx), %ecx
+; X64-NEXT:    addl %edi, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    leal (%r8,%r9), %eax
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg7_i16:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r8,%r9), %eax
+; EGPR-NEXT:    addl %r16d, %eax
+; EGPR-NEXT:    addl %ecx, %eax
+; EGPR-NEXT:    # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT:    retq
+  %a1 = add i16 %a, %b
+  %a2 = add i16 %c, %d
+  %a3 = add i16 %e, %f
+  %b1 = add i16 %a1, %a2
+  %b2 = add i16 %a3, %g
+  %c1 = add i16 %b1, %b2
+  ret i16 %c1
+}
+
+define fastcc i32 @arg8_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) nounwind {
+; X64-LABEL: arg8_i32:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    addl %r8d, %eax
+; X64-NEXT:    addl %r9d, %eax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg8_i32:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r16,%r17), %eax
+; EGPR-NEXT:    addl %r8d, %eax
+; EGPR-NEXT:    addl %r9d, %eax
+; EGPR-NEXT:    addl %ecx, %eax
+; EGPR-NEXT:    retq
+  %a1 = add i32 %a, %b
+  %a2 = add i32 %c, %d
+  %a3 = add i32 %e, %f
+  %a4 = add i32 %g, %h
+  %b1 = add i32 %a1, %a2
+  %b2 = add i32 %a3, %a4
+  %c1 = add i32 %b1, %b2
+  ret i32 %c1
+}
+
+define fastcc i64 @arg9_i64(i64 %a, i64 %b, i64 %c, i64 %d, i64 %e, i64 %f, i64 %g, i64 %h, i64 %i) nounwind {
+; X64-LABEL: arg9_i64:
+; X64:       # %bb.0:
+; X64-NEXT:    movq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT:    addq %rdi, %rsi
+; X64-NEXT:    addq %rdx, %rcx
+; X64-NEXT:    addq %rsi, %rcx
+; X64-NEXT:    addq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT:    addq %r8, %rax
+; X64-NEXT:    addq %r9, %rax
+; X64-NEXT:    addq %rcx, %rax
+; X64-NEXT:    addq {{[0-9]+}}(%rsp), %rax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg9_i64:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    leaq (%rdx,%rcx), %rcx
+; EGPR-NEXT:    addq %rdi, %rcx
+; EGPR-NEXT:    addq %rsi, %rcx
+; EGPR-NEXT:    leaq (%r16,%r17), %rax
+; EGPR-NEXT:    addq %r8, %rax
+; EGPR-NEXT:    addq %r9, %rax
+; EGPR-NEXT:    addq %rcx, %rax
+; EGPR-NEXT:    addq %r18, %rax
+; EGPR-NEXT:    retq
+  %a1 = add i64 %a, %b
+  %a2 = add i64 %c, %d
+  %a3 = add i64 %e, %f
+  %a4 = add i64 %g, %h
+  %b1 = add i64 %a1, %a2
+  %b2 = add i64 %a3, %a4
+  %c1 = add i64 %b1, %b2
+  %c2 = add i64 %c1, %i
+  ret i64 %c2
+}
+
+define fastcc i32 @arg10_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j) nounwind {
+; X64-LABEL: arg10_i32:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    addl %r8d, %r10d
+; X64-NEXT:    addl %r9d, %r10d
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    addl %r10d, %eax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg10_i32:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT:    # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r16,%r17), %eax
+; EGPR-NEXT:    addl %r8d, %eax
+; EGPR-NEXT:    addl %r9d, %eax
+; EGPR-NEXT:    addl %ecx, %eax
+; EGPR-NEXT:    addl %r18d, %eax
+; EGPR-NEXT:    addl %r19d, %eax
+; EGPR-NEXT:    retq
+  %a1 = add i32 %a, %b
+  %a2 = add i32 %c, %d
+  %a3 = add i32 %e, %f
+  %a4 = add i32 %g, %h
+  %a5 = add i32 %i, %j
+  %b1 = add i32 %a1, %a2
+  %b2 = add i32 %a3, %a4
+  %c1 = add i32 %b1, %b2
+  %c2 = add i32 %c1, %a5
+  ret i32 %c2
+}
+
+define fastcc i16 @arg11_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g, i16 %h, i16 %i, i16 %j, i16 %k) nounwind {
+; X64-LABEL: arg11_i16:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movzwl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    movzwl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %r10w
+; X64-NEXT:    addl %r8d, %r10d
+; X64-NEXT:    addl %r9d, %r10d
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    addl %r10d, %eax
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg11_i16:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT:    # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r16,%r17), %edx
+; EGPR-NEXT:    addl %r8d, %edx
+; EGPR-NEXT:    addl %r9d, %edx
+; EGPR-NEXT:    addl %ecx, %edx
+; EGPR-NEXT:    leal (%r18,%r19), %eax
+; EGPR-NEXT:    addl %r20d, %eax
+; EGPR-NEXT:    addl %edx, %eax
+; EGPR-NEXT:    # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT:    retq
+  %a1 = add i16 %a, %b
+  %a2 = add i16 %c, %d
+  %a3 = add i16 %e, %f
+  %a4 = add i16 %g, %h
+  %a5 = add i16 %i, %j
+  %b1 = add i16 %a1, %a2
+  %b2 = add i16 %a3, %a4
+  %c1 = add i16 %b1, %b2
+  %c2 = add i16 %c1, %a5
+  %c3 = add i16 %c2, %k
+  ret i16 %c3
+}
+
+define fastcc i8 @arg12_i8(i8 %a, i8 %b, i8 %c, i8 %d, i8 %e, i8 %f, i8 %g, i8 %h, i8 %i, i8 %j, i8 %k, i8 %l) nounwind {
+; X64-LABEL: arg12_i8:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movzbl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    movzbl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT:    movzbl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addb %sil, %cl
+; X64-NEXT:    leal (%r8,%r9), %edx
+; X64-NEXT:    addb {{[0-9]+}}(%rsp), %r10b
+; X64-NEXT:    addb %dl, %r10b
+; X64-NEXT:    addb %cl, %r10b
+; X64-NEXT:    addb {{[0-9]+}}(%rsp), %r11b
+; X64-NEXT:    addb {{[0-9]+}}(%rsp), %al
+; X64-NEXT:    addb %r11b, %al
+; X64-NEXT:    addb %r10b, %al
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg12_i8:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT:    # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT:    # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT:    # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdi,%rsi), %eax
+; EGPR-NEXT:    addl %edx, %ecx
+; EGPR-NEXT:    addb %al, %cl
+; EGPR-NEXT:    leal (%r8,%r9), %eax
+; EGPR-NEXT:    leal (%r16,%r17), %edx
+; EGPR-NEXT:    addb %al, %dl
+; EGPR-NEXT:    addb %cl, %dl
+; EGPR-NEXT:    leal (%r18,%r19), %ecx
+; EGPR-NEXT:    leal (%r20,%r21), %eax
+; EGPR-NEXT:    addb %cl, %al
+; EGPR-NEXT:    addb %dl, %al
+; EGPR-NEXT:    # kill: def $al killed $al killed $eax
+; EGPR-NEXT:    retq
+  %a1 = add i8 %a, %b
+  %a2 = add i8 %c, %d
+  %a3 = add i8 %e, %f
+  %a4 = add i8 %g, %h
+  %a5 = add i8 %i, %j
+  %a6 = add i8 %k, %l
+  %b1 = add i8 %a1, %a2
+  %b2 = add i8 %a3, %a4
+  %b3 = add i8 %a5, %a6
+  %c1 = add i8 %b1, %b2
+  %c2 = add i8 %c1, %b3
+  ret i8 %c2
+}
+
+define fastcc i16 @arg13_i16(i16 %a, i16 %b, i16 %c, i16 %d, i16 %e, i16 %f, i16 %g, i16 %h, i16 %i, i16 %j, i16 %k, i16 %l, i16 %m) nounwind {
+; X64-LABEL: arg13_i16:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movzwl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    movzwl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    movzwl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %r11w
+; X64-NEXT:    addl %r8d, %r11d
+; X64-NEXT:    addl %r9d, %r11d
+; X64-NEXT:    addl %ecx, %r11d
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %r10w
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT:    addl %r10d, %eax
+; X64-NEXT:    addl %r11d, %eax
+; X64-NEXT:    addw {{[0-9]+}}(%rsp), %ax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg13_i16:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT:    # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT:    # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT:    # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r16,%r17), %edx
+; EGPR-NEXT:    addl %r8d, %edx
+; EGPR-NEXT:    addl %r9d, %edx
+; EGPR-NEXT:    addl %ecx, %edx
+; EGPR-NEXT:    leal (%r20,%r21), %eax
+; EGPR-NEXT:    addl %r18d, %eax
+; EGPR-NEXT:    addl %r19d, %eax
+; EGPR-NEXT:    addl %r22d, %eax
+; EGPR-NEXT:    addl %edx, %eax
+; EGPR-NEXT:    # kill: def $ax killed $ax killed $eax
+; EGPR-NEXT:    retq
+  %a1 = add i16 %a, %b
+  %a2 = add i16 %c, %d
+  %a3 = add i16 %e, %f
+  %a4 = add i16 %g, %h
+  %a5 = add i16 %i, %j
+  %a6 = add i16 %k, %l
+  %b1 = add i16 %a1, %a2
+  %b2 = add i16 %a3, %a4
+  %b3 = add i16 %a5, %a6
+  %c1 = add i16 %b1, %b2
+  %c2 = add i16 %c1, %b3
+  %c3 = add i16 %c2, %m
+  ret i16 %c3
+}
+
+define fastcc i32 @arg14_i32(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n) nounwind {
+; X64-LABEL: arg14_i32:
+; X64:       # %bb.0:
+; X64-NEXT:    pushq %rbx
+; X64-NEXT:    # kill: def $r9d killed $r9d def $r9
+; X64-NEXT:    # kill: def $r8d killed $r8d def $r8
+; X64-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; X64-NEXT:    # kill: def $edx killed $edx def $rdx
+; X64-NEXT:    # kill: def $esi killed $esi def $rsi
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT:    movl {{[0-9]+}}(%rsp), %ebx
+; X64-NEXT:    addl %edi, %esi
+; X64-NEXT:    addl %edx, %ecx
+; X64-NEXT:    addl %esi, %ecx
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %ebx
+; X64-NEXT:    addl %r8d, %ebx
+; X64-NEXT:    addl %r9d, %ebx
+; X64-NEXT:    addl %ecx, %ebx
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %r11d
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %r10d
+; X64-NEXT:    addl %r11d, %r10d
+; X64-NEXT:    addl {{[0-9]+}}(%rsp), %eax
+; X64-NEXT:    addl %r10d, %eax
+; X64-NEXT:    addl %ebx, %eax
+; X64-NEXT:    popq %rbx
+; X64-NEXT:    retq
+;
+; EGPR-LABEL: arg14_i32:
+; EGPR:       # %bb.0:
+; EGPR-NEXT:    # kill: def $r23d killed $r23d def $r23
+; EGPR-NEXT:    # kill: def $r22d killed $r22d def $r22
+; EGPR-NEXT:    # kill: def $r21d killed $r21d def $r21
+; EGPR-NEXT:    # kill: def $r20d killed $r20d def $r20
+; EGPR-NEXT:    # kill: def $r19d killed $r19d def $r19
+; EGPR-NEXT:    # kill: def $r18d killed $r18d def $r18
+; EGPR-NEXT:    # kill: def $r17d killed $r17d def $r17
+; EGPR-NEXT:    # kill: def $r16d killed $r16d def $r16
+; EGPR-NEXT:    # kill: def $r9d killed $r9d def $r9
+; EGPR-NEXT:    # kill: def $r8d killed $r8d def $r8
+; EGPR-NEXT:    # kill: def $ecx killed $ecx def $rcx
+; EGPR-NEXT:    # kill: def $edx killed $edx def $rdx
+; EGPR-NEXT:    # kill: def $esi killed $esi def $rsi
+; EGPR-NEXT:    # kill: def $edi killed $edi def $rdi
+; EGPR-NEXT:    leal (%rdx,%rcx), %ecx
+; EGPR-NEXT:    addl %edi, %ecx
+; EGPR-NEXT:    addl %esi, %ecx
+; EGPR-NEXT:    leal (%r16,%r17), %edx
+; EGPR-NEXT:    addl %r8d, %edx
+; EGPR-NEXT:    addl %r9d, %edx
+; EGPR-NEXT:    addl %ecx, %edx
+; EGPR-NEXT:    leal (%r20,%r21), %eax
+; EGPR-NEXT:    addl %r18d, %eax
+; EGPR-NEXT:    addl %r19d, %eax
+; EGPR-NEXT:    addl %r22d, %eax
+; EGPR-NEXT:    addl %r23d, %eax
+; EGPR-NEXT:    addl %edx, %eax
+; EGPR-NEXT:    retq
+  %a1 = add i32 %a, %b
+  %a2 = add i32 %c, %d
+  %a3 = add i32 %e, %f
+  %a4 = add i32 %g, %h
+  %a5 = add i32 %i, %j
+  %a6 = add i32 %k, %l
+  %a7 = add i32 %m, %n
+  %b1 = add i32 %a1, %a2
+  %b2 = add i32 %a3, %a4
+  %b3 = add i32 %a5, %a6
+  %c1 = add i32 %b1, %b2
+  %c2 = add i32 %c1, %b3
+  %c3 = add i32 %c2, %a7
+  ret i32 %c3
+}
+
+define fastcc i64 @arg15_i64(i64 %a, i64 %b, ...
[truncated]

@KanRobert
Copy link
Contributor

Hi @phoebewang @jyknight

in #76868

We had the discussion

KanRobert : I won't expect this from the name preserve_none. R16-R31 should be used to pas arguments when they're available.

jkynight: They should not be used to pass arguments conditionally based on the current subtarget, since that creates two incompatible calling conventions.

There's no reason "preserve none" has to be read to imply "uses all possible registers to pass arguments.", so I don't see an issue with leaving it like it is.

I doubt if we have the same issue here? preserve_none is also new calling convention added by that PR.

@topperc
Copy link
Collaborator

topperc commented Oct 23, 2025

The caller and callee both need to have EGPR enabled for this to work. Right?

@KanRobert
Copy link
Contributor

The caller and callee both need to have EGPR enabled for this to work. Right?

I believe so, both the caller and callee need to know where the argument is.

@phoebewang
Copy link
Contributor Author

I think the proposal here is different with preserve_none and other dedicated calling conventions. This patch only applies to the cases of internal function optimizations: https://godbolt.org/z/3xT94Wvjv. So we don't have a subtarget different issue.

In the example of https://godbolt.org/z/fq99T6h63, I showed Clang already warns for the use of fastcc in C side. We may consider to relex it to allowing used with APX only. So we don't have a subtarget different issue even we want to expend it.

@topperc
Copy link
Collaborator

topperc commented Oct 23, 2025

I think the proposal here is different with preserve_none and other dedicated calling conventions. This patch only applies to the cases of internal function optimizations: https://godbolt.org/z/3xT94Wvjv. So we don't have a subtarget different issue.

In the example of https://godbolt.org/z/fq99T6h63, I showed Clang already warns for the use of fastcc in C side. We may consider to relex it to allowing used with APX only. So we don't have a subtarget different issue even we want to expend it.

internal doesn't guarantee the subtarget is the same. Can't target attribute or LTO create a mismatch in subtarget?

@phoebewang
Copy link
Contributor Author

I think the proposal here is different with preserve_none and other dedicated calling conventions. This patch only applies to the cases of internal function optimizations: https://godbolt.org/z/3xT94Wvjv. So we don't have a subtarget different issue.
In the example of https://godbolt.org/z/fq99T6h63, I showed Clang already warns for the use of fastcc in C side. We may consider to relex it to allowing used with APX only. So we don't have a subtarget different issue even we want to expend it.

internal doesn't guarantee the subtarget is the same. Can't target attribute or LTO create a mismatch in subtarget?

Good point! I think it is easy to solve. We can add a TTI interface like useFastCCForInternalCall similar to useColdCCForColdCall to solve the problem.

@zuban32
Copy link
Contributor

zuban32 commented Oct 23, 2025

FASTCC is not support in X64 in the C code, so we don't need to worry about compatibility issue, see https://godbolt.org/z/fq99T6h63

I think you're mixing up two orthogonal things here, IIRC fastcall is x86_fastcall CC, and fastcall calls still have to adhere to its ABI. While fastcc is omitting any ABI at all.

But that doesn't make your change unneeded, just my 2 cents

@phoebewang
Copy link
Contributor Author

FASTCC is not support in X64 in the C code, so we don't need to worry about compatibility issue, see https://godbolt.org/z/fq99T6h63

I think you're mixing up two orthogonal things here, IIRC fastcall is x86_fastcall CC, and fastcall calls still have to adhere to its ABI. While fastcc is omitting any ABI at all.

But that doesn't make your change unneeded, just my 2 cents

I see, thanks! There's nothing to do with the C fastcall then.

phoebewang added a commit to phoebewang/llvm-project that referenced this pull request Oct 23, 2025
Background: X86 APX feature adds 16 registers within the same 64-bit
mode. PR llvm#164638 is trying to extend such registers for FASTCC. However,
a blocker issue is calling convention cannot be changeable with or
without a feature.

The solution is to disable FASTCC if APX is not ready. This is an NFC
change to the final code generation, becasue X86 doesn't define an
alternative ABI for FASTCC in 64-bit mode. We can solve the potential
compatibility issue of llvm#164638 with this patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants