Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[a64] Implement an ARM64 backend #2259

Draft
wants to merge 144 commits into
base: master
Choose a base branch
from

Commits on Apr 27, 2024

  1. [Build] Add Windows ARM64 support

    Separates the `Windows` platform into `Windows-x86_64` and
    `Windows-ARM64`. Adds `--arch` argument to `build`.
    Removes x64 backend on non-x64 targets.
    Wunkolo committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    1746177 View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2024

  1. Configuration menu
    Copy the full SHA
    a6d9113 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1874f0c View commit details
    Browse the repository at this point in the history
  3. [ImGui] Stub ARM64 host debug text

    Marked as TODO for now
    Wunkolo committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    b48ec84 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f254848 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    fe9c98e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    045441a View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. [CPU] Stub ARM64 to Null CPU backend

    Adding the `a64` backend will be a different PR. For now it's stubbed to
    the null backend to allow the main executable to open without failing
    initalization.
    Wunkolo committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    f2b05ea View commit details
    Browse the repository at this point in the history
  2. [UI] Fix divide-by-zero hazard

    This value is currently returning `0` on ARM machines and throws an exception.
    Wunkolo committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    aa4a3e0 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2024

  1. [Build] Link SDL2 to xenia-app

    Addresses a build issue that seems to occur now that xenia-app is not
    getting SDL2 through one of its submodues
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    a0f6cd7 View commit details
    Browse the repository at this point in the history
  2. [CPU] Add ARM64 backend build target

    Adds the new `xenia-cpu-backend-a64` build-target with linkage following the x64 backend.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    ffc966c View commit details
    Browse the repository at this point in the history
  3. [a64] Integrate oaknut submodule

    Header-only library for emitting arm64v8 instructions.
    
    Enables C++20 only for the a64 backend for now
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    59bc265 View commit details
    Browse the repository at this point in the history
  4. [Base] Add ARM64 utility functions

    Mostly element-accessors
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    2284ed4 View commit details
    Browse the repository at this point in the history
  5. [CPU] Implement ARM64 CPU backend

    First pass framework that gets emitted ARM code executing.
    
    Based on the x64 backend, implements an ARM64 JIT backend.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    9960ef9 View commit details
    Browse the repository at this point in the history
  6. [a64] Fix BYTE_SWAP_V128

    This just reverses the bytes of 32-bit values, not reverse the whole vector.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    39429aa View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    b9571cf View commit details
    Browse the repository at this point in the history
  8. [a64] Implement OPCODE_SPLAT

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    652b7a1 View commit details
    Browse the repository at this point in the history
  9. [a64] Implement OPCODE_INSERT

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    10310d7 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    61feb6a View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    1b574be View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    72380bf View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    6770682 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    10cba8e View commit details
    Browse the repository at this point in the history
  15. [a64] Fix StackLayout

    Wrong register index and vector-register size
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    defb68e View commit details
    Browse the repository at this point in the history
  16. [a64] Fix Guest-To-Host native calls

    These calls need to preserve and restore the `lr` register.
    
    Unit tests all run now!
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    124f684 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    8aa4b93 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    6a0e6a9 View commit details
    Browse the repository at this point in the history
  19. [a64] Fix overwriting of return-value registers

    These are stomping over X0 and Q0 which is returning input argument registers as return values.
    Fixes some guest-to-host calls.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3d345d7 View commit details
    Browse the repository at this point in the history
  20. [a64] Implement OPCODE_VECTOR_SHL

    Vector registers are passed as pointers rather than directly in the `Qn` registers. So these functions should be taking pointer-type arguments rather than vector-register types directly.
    
    Fixes `OPCODE_VECTOR_SHL` and passes unit tests.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    07a4df8 View commit details
    Browse the repository at this point in the history
  21. [a64] Remove volatile storing of X0/Q0

    We dont load it back so no need to store it
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    88ed113 View commit details
    Browse the repository at this point in the history
  22. [a64] Implement OPCODE_VECTOR_{SHR,SHA}

    Passes all unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    7feea4c View commit details
    Browse the repository at this point in the history
  23. [a64] Implement OPCODE_VECTOR_ROTATE_LEFT

    Uses the emulated fallback for now. Will have to come back to this later. Passes unit tests.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3ac5121 View commit details
    Browse the repository at this point in the history
  24. [a64] Implement OPCODE_VECTOR_MIN

    Passes unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    ebd1f84 View commit details
    Browse the repository at this point in the history
  25. [a64] Implement OPCODE_VECTOR_MAX

    Passes unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    584c34c View commit details
    Browse the repository at this point in the history
  26. [a64] Implement OPCODE_VECTOR_ADD

    There is quite literally an instruction for each and every one of these cases.
    
    Passes unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    35e8a80 View commit details
    Browse the repository at this point in the history
  27. [a64] Fix native vector calls

    Arguments need to be pointers stored in X0, X1, X2, ... rather than bassed directly in Q0, Q1 etc.
    
    There are no unit tests for these functions in particular.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e62f3f3 View commit details
    Browse the repository at this point in the history
  28. [a64] Implement OPCODE_PACK(FLOAT16)

    Fails the unit tests due to subtle rounding errors
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3b2612b View commit details
    Browse the repository at this point in the history
  29. [a64] Implement OPCODE_PACK(SHORT)

    Fails unit tests due to subtle rounding errors
    
    `SHORT_4` unit-test is missing but implementation is the same as `SHORT_4`
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e5fd3d3 View commit details
    Browse the repository at this point in the history
  30. [a64] Implement HIR Branch labeling

    Adds support for HIR labels to create actual oaknut labels
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    8257740 View commit details
    Browse the repository at this point in the history
  31. [a64] Implement control sequences

    Implements control sequences such as conditional branching, breaking, and trapping
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    725ea3d View commit details
    Browse the repository at this point in the history
  32. [a64] Fix ResolveFunction thunk

    Register was getting stomped over
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    5b8ac36 View commit details
    Browse the repository at this point in the history
  33. [a64] Fix resetting of labels during Emplace

    On the x64 side, this is the same as the `reset()` function resetting the label-manager
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    65288d5 View commit details
    Browse the repository at this point in the history
  34. [a64] Fix ResolveFunctionThunk call

    Resolving the function puts it into X0 and should be called immediately after.
    
    We were just calling ResolveFunction on ResolveFunction recursively
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    dfa5bdb View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    a1741bf View commit details
    Browse the repository at this point in the history
  36. [a64] Draft Windows-ARM64 stack unwinding data

    Things still get weird at the thunks, but this allows for callstacks between-to-guest calls
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    9b70ea0 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    17987ca View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    9ec4b68 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    c428d79 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    6a5f461 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    5bff71f View commit details
    Browse the repository at this point in the history
  42. [a64] Refactor XSP to SP

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    b5d55e1 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    018e484 View commit details
    Browse the repository at this point in the history
  44. [a64] Remove redundant zero-extension during address computation

    Also changes the register to X3 by default
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    8b4b713 View commit details
    Browse the repository at this point in the history
  45. [a64] Fix CallIndirect return address

    Should be `GUEST_RET_ADDR` not `GUEST_CALL_RET_ADDR`.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    2b3147b View commit details
    Browse the repository at this point in the history
  46. [a64] Refactor REV{32,64} to REV

    Let the register type determine the reverse-size
    
    REV32 was also the wrong instruction to use.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    4f5c640 View commit details
    Browse the repository at this point in the history
  47. [a64] Implement OPCODE_MEMSET

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    8836eb2 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    8a1e343 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    d656c5b View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    cf6c2c2 View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    647d26c View commit details
    Browse the repository at this point in the history
  52. [a64] Fix ComputeMemoryAddress{Offset} register stomp

    `W1` is a possible HIR register allocation and using W1 here was stomping over it. Don't use W1, use the provided "scratch" register.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    52b2593 View commit details
    Browse the repository at this point in the history
  53. [a64] Refactor REV{16,32} to REV

    Derive the reversal-size from the register-size.
    REV32 is also the wrong one to be using here since it will reverse the bytes of upper and lower 32-bit words.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    0f9769b View commit details
    Browse the repository at this point in the history
  54. [a64] Reorganize guest register allocation

    Share a somewhat similar calling convention as ARM64
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    49f9edb View commit details
    Browse the repository at this point in the history
  55. [a64] Remove standard prolog/epilog from thunks

    Fixes callstacks!!!!
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    906d0c6 View commit details
    Browse the repository at this point in the history
  56. [a64] Fix EmitGetCurrentThreadId type

    16-bit word rather than 8-bit
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    540344f View commit details
    Browse the repository at this point in the history
  57. [a64] Fix immediates being too large

    These instructions need to use an extra register to generate their constants if they are too large
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    ba924fe View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    e4d3b2a View commit details
    Browse the repository at this point in the history
  59. [a64] Fix external function call arguments

    `x0` was loading the thunk rather than using `xip`
    
    Fixes lots of init bugs!
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    c6a7270 View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    b18f2ff View commit details
    Browse the repository at this point in the history
  61. [a64] Compute memory offsets as 32-bit registers

    Additionally fixes some instruction forms to use the more general `STR` instruction with an offset
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    47665fd View commit details
    Browse the repository at this point in the history
  62. Configuration menu
    Copy the full SHA
    2d093ae View commit details
    Browse the repository at this point in the history
  63. [a64] Fix 32-bit store

    You wouldn't believe how much time this bug costed me
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    fd32c0e View commit details
    Browse the repository at this point in the history
  64. [a64] Update guest calling conventions

    Guest-function calls will use W17 for indirect calls
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    dc6666d View commit details
    Browse the repository at this point in the history
  65. [a64] Fix instruction constant generation

    Fixes some offset generation as well
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6e83e2a View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    fbc306f View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    c495fe7 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    31b2ccd View commit details
    Browse the repository at this point in the history
  69. [a64] Preserve X0 when resolving functions

    Fixes indirect branches
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6f0ff9e View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    1bdc243 View commit details
    Browse the repository at this point in the history
  71. [a64] Fix signed MUL_HI

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    866ce97 View commit details
    Browse the repository at this point in the history
  72. [a64] Fix non-const MUL_I32

    Was picking up `W0` rather than src1
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    50d7ad5 View commit details
    Browse the repository at this point in the history
  73. Configuration menu
    Copy the full SHA
    b532ab5 View commit details
    Browse the repository at this point in the history
  74. [a64] Implement PERMUTE_I32

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    c4b2638 View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    f73c8fe View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    046e8ed View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    f5e14d6 View commit details
    Browse the repository at this point in the history
  78. Configuration menu
    Copy the full SHA
    737f2b5 View commit details
    Browse the repository at this point in the history
  79. Configuration menu
    Copy the full SHA
    3adb86c View commit details
    Browse the repository at this point in the history
  80. Configuration menu
    Copy the full SHA
    87cca91 View commit details
    Browse the repository at this point in the history
  81. [a64] Fix AND_NOT_V128

    Operand order is wrong.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    2e2f47f View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    207e2c1 View commit details
    Browse the repository at this point in the history
  83. [a64] Fix OPCODE_SPLAT

    Writing to the wrong register!
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    de040f0 View commit details
    Browse the repository at this point in the history
  84. [a64] Fix SELECT_V128_V128

    Potential input-register stomping and operand order is seemingly wrong.
    
    Passes generated unit tests.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    1ad0d7e View commit details
    Browse the repository at this point in the history
  85. [a64] Implement OPCODE_VECTOR_AVERAGE

    Passes generated unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    edfd2f2 View commit details
    Browse the repository at this point in the history
  86. Configuration menu
    Copy the full SHA
    6b4ff8b View commit details
    Browse the repository at this point in the history
  87. Configuration menu
    Copy the full SHA
    42d41a5 View commit details
    Browse the repository at this point in the history
  88. [a64] Refactor OPCODE_ATOMIC_COMPARE_EXCHANGE

    Much more explicit arguments while trying to debug a deadlock
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    be0c793 View commit details
    Browse the repository at this point in the history
  89. [a64] Fix OPCODE_MAX

    Was not handling constant arguments properly
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    28b629e View commit details
    Browse the repository at this point in the history
  90. [a64] Fix MUL_HI_I32 operands

    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    41eeae1 View commit details
    Browse the repository at this point in the history
  91. [a64] Fix OPCODE_VECTOR_SHA(constant)

    Values should be modulo-element-size
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e2d141e View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    0e2f756 View commit details
    Browse the repository at this point in the history
  93. [a64] Fix OPCODE_VECTOR_CONVERT_{I2F,F2I}

    😳
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    1919dda View commit details
    Browse the repository at this point in the history
  94. Configuration menu
    Copy the full SHA
    d3d3ea3 View commit details
    Browse the repository at this point in the history
  95. [a64] Fix VECTOR_CONVERT_F2I rounding

    ```
    4.2.2.4 Floating-Point Rounding and Conversion Instructions
    ...
    Floating-point conversions to integers (vctuxs, vctsxs) use round-toward-zero (truncate).
    ...
    ```
    
    This passes all of the `vctuxs` and `vctsxs` unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    7eca228 View commit details
    Browse the repository at this point in the history
  96. [a64] Implement PERMUTE_V128(int16)

    Passes 'vmrghh' and `vmrglh` unit-tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    684904c View commit details
    Browse the repository at this point in the history
  97. [a64] Optimize OPCODE_MUL_ADD

    Use `FMADD` and `FMLA`
    Tests are the same, though now it should run a bit faster.
    The tests that fail are primarily denormals and other subtle precision
    issues it seems.
    
    Ex:
    ```
    i> 00002358   - vmaddfp_7298_GEN
    !> 00002358 Register v4 assert failed:
    !> 00002358   Expected: v4 == [00000000, 00000000, 00000000, 00000000]
    !> 00002358     Actual: v4 == [000D000E, 00138014, 000E4CDC, 0018B34D]
    !> 00002358     TEST FAILED
    ```
    
    Host-To-Guest and Guest-To-Host thunks should probably restore/preserve
    the FPCR to maintain these roundings.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    b9d0752 View commit details
    Browse the repository at this point in the history
  98. [a64] Fix OPCODE_CNTLZ

    8 and 16 bit CNTLZ needs its bit-count fixed to its original element-type
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    bec248c View commit details
    Browse the repository at this point in the history
  99. [a64] Implement kDebugInfoTraceFunctions and `kDebugInfoTraceFuncti…

    …onCoverage`
    
    Relies on armv8.1-a atomic features
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    c33f543 View commit details
    Browse the repository at this point in the history
  100. [a64] Fix ATOMIC_COMPARE_EXCHANGE_I32 comparison type

    This fixes 32-bit atomic-compare-exchanges.
    The upper-half of the input register _must_ be clipped off.
    
    This fixes a deadlock in some games.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    f1235be View commit details
    Browse the repository at this point in the history
  101. Configuration menu
    Copy the full SHA
    a542265 View commit details
    Browse the repository at this point in the history
  102. [a64] Reduce function prolog/epilog to 16 bytes

    Just need to store `fp` and `lr`
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    eb0736e View commit details
    Browse the repository at this point in the history
  103. Configuration menu
    Copy the full SHA
    f7bd0c8 View commit details
    Browse the repository at this point in the history
  104. [a64] Implement instruction stepping.

    Uses `0x0000'dead` as an instructon-stepping sentinel value.
    Support for basic jumping instructions like `b`, `bl`, `br`, and `blr`.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    c3efaaa View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    a7ae117 View commit details
    Browse the repository at this point in the history
  106. [a64] Optimize vector-constant generation

    Uses MOVI to optimize some cases of constants rather than EOR.
    MOVI is a register-renaming idiom on many architectures.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    e2d1e5d View commit details
    Browse the repository at this point in the history
  107. [a64] Optimize memory-address calculation

    The LSL can be embedded into the ADD to remove an additional instruction.
    What was `cset`+`lsl`+`add` should now just be `cset`+`add ... LSL 12`
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6e2910b View commit details
    Browse the repository at this point in the history
  108. [a64] Optimize OPCODE_MEMSET

    Use pair-stores rather than singular-stores to write 32-bytes of data at a time.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    9b5a690 View commit details
    Browse the repository at this point in the history
  109. [a64] Implement OPCODE_LOAD_CLOCk clock_source_raw

    Uses the `CNTVCT_EL0`-register and applies frequency scaling
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    7c094dc View commit details
    Browse the repository at this point in the history
  110. Configuration menu
    Copy the full SHA
    40d908b View commit details
    Browse the repository at this point in the history
  111. [a64] Fix OPCODE_PACK saturation edge-cases

    Passes cpu-ppc-tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6478623 View commit details
    Browse the repository at this point in the history
  112. [a64] Implement OPCODE_UNPACK

    This is a very literal translation from the x64 code into ARM and may not be very optimized. Passes unit test save for a couple off-by-one errors.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    96d444d View commit details
    Browse the repository at this point in the history
  113. [a64] Implement LSE and FP16C detection

    Adds two new flags for allowing the use of LSE and FP16C
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    06daedf View commit details
    Browse the repository at this point in the history
  114. Configuration menu
    Copy the full SHA
    2d72b40 View commit details
    Browse the repository at this point in the history
  115. [a64] Fix OPCODE_PACK(short)

    Narrow-saturation instructions causes off-by-one rounding errors.
    Using the min+max+shuffle passes more unit tests
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    4ff43ae View commit details
    Browse the repository at this point in the history
  116. [a64] Optimize bulk VConst access with relative addressing

    Load the pointer to the VConst table once, and use offsets from this base address from the underlying enum value.
    Reduces the amount of instructions for each VConst memory load.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    fc1a13d View commit details
    Browse the repository at this point in the history
  117. [a64] Optimize constant vector byte-splats

    Detect when all bytes are repeating and use `MOVI` when applicable
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    bf12583 View commit details
    Browse the repository at this point in the history
  118. [a64] Fix OPCODE_SWIZZLE register-aliasing

    Indices and non-const tables were using the same scratch-register
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    63f31d5 View commit details
    Browse the repository at this point in the history
  119. [a64] Implement raw clock source

    Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source.
    
    On my ThinkPad x13s, the raw clock source returns a tick-frequency of
    19,200,000 while the platform clock source(QueryPerformanceFrequency)
    returns 10,000,000. Almost double the accuracy over the platform-clock!
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3b1a696 View commit details
    Browse the repository at this point in the history
  120. Configuration menu
    Copy the full SHA
    cba92a2 View commit details
    Browse the repository at this point in the history
  121. [a64] Add arch-agnostic documentation configurations

    Misses some during the first pass. Now the config files with mention a64 differences.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    7b9f791 View commit details
    Browse the repository at this point in the history
  122. [a64] Optimize zero MovMem64

    Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    818a773 View commit details
    Browse the repository at this point in the history
  123. [a64] Implement OPCODE_DID_SATURATE

    This directly maps to the QC bit in the FPSR. Just have to make sure
    that the saturated instruction is the very last instruction(which is
    currently the case for stuff like VECTOR_ADD and such).
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    f830f79 View commit details
    Browse the repository at this point in the history
  124. [a64] Detect MOVI utilizations for vector-element splats(u8,u16,u32)

    The 64-bit cases uses a particular Replicated 8-bit immediate so
    something else will have to handle that  This cases a lot of cases
    without having to touch memory. Does not catch cases of
    `1.0`(0x3f800000).
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    8f6c0ad View commit details
    Browse the repository at this point in the history
  125. [a64] Optimize constant-loads with FMOV

    `FMOV` encodes an 8-bit floating point immediate that can be used to
    accelerate the loading of certain constant floating point values between
    -31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall
    within this range and this code gets lots of hits in my testing. This is
    much more optimal than trying to load a 32/64-bit value in W0/X0 and
    moving it into an FP register.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    4655bc1 View commit details
    Browse the repository at this point in the history
  126. [a64] Implement armv8.0 atomic operations

    Uses LSE when available, but provides an armv8.0 baseline implementation.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    151700d View commit details
    Browse the repository at this point in the history
  127. [a64] Remove x64 reference implementations

    Removes all comments relating to x64 implementation details
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    164f1e4 View commit details
    Browse the repository at this point in the history
  128. [a64] Implement OPCODE_CACHE_CONTROL

    `dc civac` causes an illegal-instruciton on Windows-ARM. This is likely
    as a security measure against cache-attacks. On Linux this instruction
    is trapped into an EL1 kernel function. Windows does not seem to have
    any user-mode cache-maintenance instructions available for
    data-cache(only instruction-cache via `FlushInstructionCache`).
    
    The closest thing we can do for now is a full data memory-barrier with
    `dsb ish`.
    
    Prefetches are implemented using `prfm pldl1keep, ...`.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    1127fd9 View commit details
    Browse the repository at this point in the history
  129. [a64] Fix out-of-bounds OPCODE_VECTOR_SHL(all-same) case

    Out-of-bound shift-values are handled as modulo-element-size
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    02edbd2 View commit details
    Browse the repository at this point in the history
  130. [a64] Use VectorCodeGenerator rather than CodeBlock+CodeGenerator

    The emitter doesn't actually hold onto executable code, but just
    generates the assembly-data into a buffer for the currently-resolving
    function before placing it into a code-cache. When code gets pushed into
    the code-cache, it can just be copied from an `std::vector` and reset.
    The code-cache itself maintains the actual executable memory and
    stack-unwinding code and such.
    
    This also fixes a bunch of errornous relative-addressing glitches where
    relative addresses were calculated based on the address of the unused
    CodeBlock rather than being position-independent. `MOVP2R` in particular
    was generating different instructions depending on its distance from the
    code block when it should always just use `MOV` and not do any
    relative-address calculations since we can't predict where the actual
    instruction's offset will be(we cannot predict what the program counter
    will be). Oaknut probably needs a "position independent" policy or mode
    or something so that it avoids PC-relative instructions.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    2953e2e View commit details
    Browse the repository at this point in the history
  131. [a64] Replace instances of MOV+DUP-splats to MOVI`

    These `MOV`->`DUP` splats can just be a singular `MOVI` instruction
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    3acd0a3 View commit details
    Browse the repository at this point in the history
  132. [a64] Optimize OPCODE_SPLAT byte-constants

    Byte-sized constants can utilize the `MOVI` instructions. This makes
    many cases such as zero-splats much faster since this encodes as just a
    register-rename(similar to `xor` on x64).
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    539a03d View commit details
    Browse the repository at this point in the history
  133. [a64] Optimize OPCODE_SPLAT with MOVI/FMOV

    Moves the `FMOV` constant functions into `a64_util` so it is available to other translation units. Optimize constant-splats with conditional use of `MOVI` and `FMOV`.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    9c8b067 View commit details
    Browse the repository at this point in the history
  134. [a64] Remove redundant OPCODE_DOT_PRODUCT_{3,4} lane-isolation

    The last `FADDP` writes into an `S` register, which automatically masks all the other lanes to zero.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    9c572c3 View commit details
    Browse the repository at this point in the history
  135. [a64] Implement support for large stack sizes

    The `SUB` instruction can only encode immediates in the form of `0xFFF`
    or `0xFFF000`. In the case that the stack size is greater than `0xFFF`,
    then just align the stack-size by `0x1000` to keep the bottom 12 bits
    clear.
    Wunkolo committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    a8b9cd8 View commit details
    Browse the repository at this point in the history