-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[a64] Implement an ARM64 backend #2259
base: master
Are you sure you want to change the base?
Commits on Apr 27, 2024
-
[Build] Add Windows ARM64 support
Separates the `Windows` platform into `Windows-x86_64` and `Windows-ARM64`. Adds `--arch` argument to `build`. Removes x64 backend on non-x64 targets.
Configuration menu - View commit details
-
Copy full SHA for 1746177 - Browse repository at this point
Copy the full SHA 1746177View commit details
Commits on Apr 28, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a6d9113 - Browse repository at this point
Copy the full SHA a6d9113View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1874f0c - Browse repository at this point
Copy the full SHA 1874f0cView commit details -
Configuration menu - View commit details
-
Copy full SHA for b48ec84 - Browse repository at this point
Copy the full SHA b48ec84View commit details -
Configuration menu - View commit details
-
Copy full SHA for f254848 - Browse repository at this point
Copy the full SHA f254848View commit details -
Configuration menu - View commit details
-
Copy full SHA for fe9c98e - Browse repository at this point
Copy the full SHA fe9c98eView commit details -
[Base] Add Windows-ARM64
bit_count
implementationUses intrinsics from https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170
Configuration menu - View commit details
-
Copy full SHA for 045441a - Browse repository at this point
Copy the full SHA 045441aView commit details
Commits on Apr 29, 2024
-
[CPU] Stub ARM64 to Null CPU backend
Adding the `a64` backend will be a different PR. For now it's stubbed to the null backend to allow the main executable to open without failing initalization.
Configuration menu - View commit details
-
Copy full SHA for f2b05ea - Browse repository at this point
Copy the full SHA f2b05eaView commit details -
[UI] Fix divide-by-zero hazard
This value is currently returning `0` on ARM machines and throws an exception.
Configuration menu - View commit details
-
Copy full SHA for aa4a3e0 - Browse repository at this point
Copy the full SHA aa4a3e0View commit details
Commits on Jun 23, 2024
-
[Build] Link SDL2 to xenia-app
Addresses a build issue that seems to occur now that xenia-app is not getting SDL2 through one of its submodues
Configuration menu - View commit details
-
Copy full SHA for a0f6cd7 - Browse repository at this point
Copy the full SHA a0f6cd7View commit details -
[CPU] Add ARM64 backend build target
Adds the new `xenia-cpu-backend-a64` build-target with linkage following the x64 backend.
Configuration menu - View commit details
-
Copy full SHA for ffc966c - Browse repository at this point
Copy the full SHA ffc966cView commit details -
[a64] Integrate
oaknut
submoduleHeader-only library for emitting arm64v8 instructions. Enables C++20 only for the a64 backend for now
Configuration menu - View commit details
-
Copy full SHA for 59bc265 - Browse repository at this point
Copy the full SHA 59bc265View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2284ed4 - Browse repository at this point
Copy the full SHA 2284ed4View commit details -
[CPU] Implement ARM64 CPU backend
First pass framework that gets emitted ARM code executing. Based on the x64 backend, implements an ARM64 JIT backend.
Configuration menu - View commit details
-
Copy full SHA for 9960ef9 - Browse repository at this point
Copy the full SHA 9960ef9View commit details -
This just reverses the bytes of 32-bit values, not reverse the whole vector.
Configuration menu - View commit details
-
Copy full SHA for 39429aa - Browse repository at this point
Copy the full SHA 39429aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for b9571cf - Browse repository at this point
Copy the full SHA b9571cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 652b7a1 - Browse repository at this point
Copy the full SHA 652b7a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 10310d7 - Browse repository at this point
Copy the full SHA 10310d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 61feb6a - Browse repository at this point
Copy the full SHA 61feb6aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b574be - Browse repository at this point
Copy the full SHA 1b574beView commit details -
Configuration menu - View commit details
-
Copy full SHA for 72380bf - Browse repository at this point
Copy the full SHA 72380bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6770682 - Browse repository at this point
Copy the full SHA 6770682View commit details -
Configuration menu - View commit details
-
Copy full SHA for 10cba8e - Browse repository at this point
Copy the full SHA 10cba8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for defb68e - Browse repository at this point
Copy the full SHA defb68eView commit details -
[a64] Fix Guest-To-Host native calls
These calls need to preserve and restore the `lr` register. Unit tests all run now!
Configuration menu - View commit details
-
Copy full SHA for 124f684 - Browse repository at this point
Copy the full SHA 124f684View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8aa4b93 - Browse repository at this point
Copy the full SHA 8aa4b93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a0e6a9 - Browse repository at this point
Copy the full SHA 6a0e6a9View commit details -
[a64] Fix overwriting of return-value registers
These are stomping over X0 and Q0 which is returning input argument registers as return values. Fixes some guest-to-host calls.
Configuration menu - View commit details
-
Copy full SHA for 3d345d7 - Browse repository at this point
Copy the full SHA 3d345d7View commit details -
[a64] Implement
OPCODE_VECTOR_SHL
Vector registers are passed as pointers rather than directly in the `Qn` registers. So these functions should be taking pointer-type arguments rather than vector-register types directly. Fixes `OPCODE_VECTOR_SHL` and passes unit tests.
Configuration menu - View commit details
-
Copy full SHA for 07a4df8 - Browse repository at this point
Copy the full SHA 07a4df8View commit details -
[a64] Remove volatile storing of X0/Q0
We dont load it back so no need to store it
Configuration menu - View commit details
-
Copy full SHA for 88ed113 - Browse repository at this point
Copy the full SHA 88ed113View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7feea4c - Browse repository at this point
Copy the full SHA 7feea4cView commit details -
[a64] Implement
OPCODE_VECTOR_ROTATE_LEFT
Uses the emulated fallback for now. Will have to come back to this later. Passes unit tests.
Configuration menu - View commit details
-
Copy full SHA for 3ac5121 - Browse repository at this point
Copy the full SHA 3ac5121View commit details -
Configuration menu - View commit details
-
Copy full SHA for ebd1f84 - Browse repository at this point
Copy the full SHA ebd1f84View commit details -
Configuration menu - View commit details
-
Copy full SHA for 584c34c - Browse repository at this point
Copy the full SHA 584c34cView commit details -
[a64] Implement
OPCODE_VECTOR_ADD
There is quite literally an instruction for each and every one of these cases. Passes unit tests
Configuration menu - View commit details
-
Copy full SHA for 35e8a80 - Browse repository at this point
Copy the full SHA 35e8a80View commit details -
Arguments need to be pointers stored in X0, X1, X2, ... rather than bassed directly in Q0, Q1 etc. There are no unit tests for these functions in particular.
Configuration menu - View commit details
-
Copy full SHA for e62f3f3 - Browse repository at this point
Copy the full SHA e62f3f3View commit details -
[a64] Implement
OPCODE_PACK
(FLOAT16)Fails the unit tests due to subtle rounding errors
Configuration menu - View commit details
-
Copy full SHA for 3b2612b - Browse repository at this point
Copy the full SHA 3b2612bView commit details -
[a64] Implement
OPCODE_PACK
(SHORT)Fails unit tests due to subtle rounding errors `SHORT_4` unit-test is missing but implementation is the same as `SHORT_4`
Configuration menu - View commit details
-
Copy full SHA for e5fd3d3 - Browse repository at this point
Copy the full SHA e5fd3d3View commit details -
[a64] Implement HIR Branch labeling
Adds support for HIR labels to create actual oaknut labels
Configuration menu - View commit details
-
Copy full SHA for 8257740 - Browse repository at this point
Copy the full SHA 8257740View commit details -
[a64] Implement control sequences
Implements control sequences such as conditional branching, breaking, and trapping
Configuration menu - View commit details
-
Copy full SHA for 725ea3d - Browse repository at this point
Copy the full SHA 725ea3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b8ac36 - Browse repository at this point
Copy the full SHA 5b8ac36View commit details -
[a64] Fix resetting of labels during Emplace
On the x64 side, this is the same as the `reset()` function resetting the label-manager
Configuration menu - View commit details
-
Copy full SHA for 65288d5 - Browse repository at this point
Copy the full SHA 65288d5View commit details -
[a64] Fix ResolveFunctionThunk call
Resolving the function puts it into X0 and should be called immediately after. We were just calling ResolveFunction on ResolveFunction recursively
Configuration menu - View commit details
-
Copy full SHA for dfa5bdb - Browse repository at this point
Copy the full SHA dfa5bdbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a1741bf - Browse repository at this point
Copy the full SHA a1741bfView commit details -
[a64] Draft Windows-ARM64 stack unwinding data
Things still get weird at the thunks, but this allows for callstacks between-to-guest calls
Configuration menu - View commit details
-
Copy full SHA for 9b70ea0 - Browse repository at this point
Copy the full SHA 9b70ea0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 17987ca - Browse repository at this point
Copy the full SHA 17987caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ec4b68 - Browse repository at this point
Copy the full SHA 9ec4b68View commit details -
Configuration menu - View commit details
-
Copy full SHA for c428d79 - Browse repository at this point
Copy the full SHA c428d79View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a5f461 - Browse repository at this point
Copy the full SHA 6a5f461View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5bff71f - Browse repository at this point
Copy the full SHA 5bff71fView commit details -
Configuration menu - View commit details
-
Copy full SHA for b5d55e1 - Browse repository at this point
Copy the full SHA b5d55e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 018e484 - Browse repository at this point
Copy the full SHA 018e484View commit details -
[a64] Remove redundant zero-extension during address computation
Also changes the register to X3 by default
Configuration menu - View commit details
-
Copy full SHA for 8b4b713 - Browse repository at this point
Copy the full SHA 8b4b713View commit details -
[a64] Fix
CallIndirect
return addressShould be `GUEST_RET_ADDR` not `GUEST_CALL_RET_ADDR`.
Configuration menu - View commit details
-
Copy full SHA for 2b3147b - Browse repository at this point
Copy the full SHA 2b3147bView commit details -
[a64] Refactor
REV{32,64}
toREV
Let the register type determine the reverse-size REV32 was also the wrong instruction to use.
Configuration menu - View commit details
-
Copy full SHA for 4f5c640 - Browse repository at this point
Copy the full SHA 4f5c640View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8836eb2 - Browse repository at this point
Copy the full SHA 8836eb2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a1e343 - Browse repository at this point
Copy the full SHA 8a1e343View commit details -
Configuration menu - View commit details
-
Copy full SHA for d656c5b - Browse repository at this point
Copy the full SHA d656c5bView commit details -
Configuration menu - View commit details
-
Copy full SHA for cf6c2c2 - Browse repository at this point
Copy the full SHA cf6c2c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 647d26c - Browse repository at this point
Copy the full SHA 647d26cView commit details -
[a64] Fix
ComputeMemoryAddress{Offset}
register stomp`W1` is a possible HIR register allocation and using W1 here was stomping over it. Don't use W1, use the provided "scratch" register.
Configuration menu - View commit details
-
Copy full SHA for 52b2593 - Browse repository at this point
Copy the full SHA 52b2593View commit details -
[a64] Refactor
REV{16,32}
toREV
Derive the reversal-size from the register-size. REV32 is also the wrong one to be using here since it will reverse the bytes of upper and lower 32-bit words.
Configuration menu - View commit details
-
Copy full SHA for 0f9769b - Browse repository at this point
Copy the full SHA 0f9769bView commit details -
[a64] Reorganize guest register allocation
Share a somewhat similar calling convention as ARM64
Configuration menu - View commit details
-
Copy full SHA for 49f9edb - Browse repository at this point
Copy the full SHA 49f9edbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 906d0c6 - Browse repository at this point
Copy the full SHA 906d0c6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 540344f - Browse repository at this point
Copy the full SHA 540344fView commit details -
[a64] Fix immediates being too large
These instructions need to use an extra register to generate their constants if they are too large
Configuration menu - View commit details
-
Copy full SHA for ba924fe - Browse repository at this point
Copy the full SHA ba924feView commit details -
Configuration menu - View commit details
-
Copy full SHA for e4d3b2a - Browse repository at this point
Copy the full SHA e4d3b2aView commit details -
[a64] Fix external function call arguments
`x0` was loading the thunk rather than using `xip` Fixes lots of init bugs!
Configuration menu - View commit details
-
Copy full SHA for c6a7270 - Browse repository at this point
Copy the full SHA c6a7270View commit details -
Configuration menu - View commit details
-
Copy full SHA for b18f2ff - Browse repository at this point
Copy the full SHA b18f2ffView commit details -
[a64] Compute memory offsets as 32-bit registers
Additionally fixes some instruction forms to use the more general `STR` instruction with an offset
Configuration menu - View commit details
-
Copy full SHA for 47665fd - Browse repository at this point
Copy the full SHA 47665fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d093ae - Browse repository at this point
Copy the full SHA 2d093aeView commit details -
You wouldn't believe how much time this bug costed me
Configuration menu - View commit details
-
Copy full SHA for fd32c0e - Browse repository at this point
Copy the full SHA fd32c0eView commit details -
[a64] Update guest calling conventions
Guest-function calls will use W17 for indirect calls
Configuration menu - View commit details
-
Copy full SHA for dc6666d - Browse repository at this point
Copy the full SHA dc6666dView commit details -
[a64] Fix instruction constant generation
Fixes some offset generation as well
Configuration menu - View commit details
-
Copy full SHA for 6e83e2a - Browse repository at this point
Copy the full SHA 6e83e2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for fbc306f - Browse repository at this point
Copy the full SHA fbc306fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c495fe7 - Browse repository at this point
Copy the full SHA c495fe7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 31b2ccd - Browse repository at this point
Copy the full SHA 31b2ccdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f0ff9e - Browse repository at this point
Copy the full SHA 6f0ff9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1bdc243 - Browse repository at this point
Copy the full SHA 1bdc243View commit details -
Configuration menu - View commit details
-
Copy full SHA for 866ce97 - Browse repository at this point
Copy the full SHA 866ce97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50d7ad5 - Browse repository at this point
Copy the full SHA 50d7ad5View commit details -
Configuration menu - View commit details
-
Copy full SHA for b532ab5 - Browse repository at this point
Copy the full SHA b532ab5View commit details -
Configuration menu - View commit details
-
Copy full SHA for c4b2638 - Browse repository at this point
Copy the full SHA c4b2638View commit details -
Configuration menu - View commit details
-
Copy full SHA for f73c8fe - Browse repository at this point
Copy the full SHA f73c8feView commit details -
Configuration menu - View commit details
-
Copy full SHA for 046e8ed - Browse repository at this point
Copy the full SHA 046e8edView commit details -
Configuration menu - View commit details
-
Copy full SHA for f5e14d6 - Browse repository at this point
Copy the full SHA f5e14d6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 737f2b5 - Browse repository at this point
Copy the full SHA 737f2b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3adb86c - Browse repository at this point
Copy the full SHA 3adb86cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87cca91 - Browse repository at this point
Copy the full SHA 87cca91View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e2f47f - Browse repository at this point
Copy the full SHA 2e2f47fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 207e2c1 - Browse repository at this point
Copy the full SHA 207e2c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for de040f0 - Browse repository at this point
Copy the full SHA de040f0View commit details -
Potential input-register stomping and operand order is seemingly wrong. Passes generated unit tests.
Configuration menu - View commit details
-
Copy full SHA for 1ad0d7e - Browse repository at this point
Copy the full SHA 1ad0d7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for edfd2f2 - Browse repository at this point
Copy the full SHA edfd2f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b4ff8b - Browse repository at this point
Copy the full SHA 6b4ff8bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 42d41a5 - Browse repository at this point
Copy the full SHA 42d41a5View commit details -
[a64] Refactor
OPCODE_ATOMIC_COMPARE_EXCHANGE
Much more explicit arguments while trying to debug a deadlock
Configuration menu - View commit details
-
Copy full SHA for be0c793 - Browse repository at this point
Copy the full SHA be0c793View commit details -
Configuration menu - View commit details
-
Copy full SHA for 28b629e - Browse repository at this point
Copy the full SHA 28b629eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 41eeae1 - Browse repository at this point
Copy the full SHA 41eeae1View commit details -
[a64] Fix
OPCODE_VECTOR_SHA
(constant)Values should be modulo-element-size
Configuration menu - View commit details
-
Copy full SHA for e2d141e - Browse repository at this point
Copy the full SHA e2d141eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e2f756 - Browse repository at this point
Copy the full SHA 0e2f756View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1919dda - Browse repository at this point
Copy the full SHA 1919ddaView commit details -
Configuration menu - View commit details
-
Copy full SHA for d3d3ea3 - Browse repository at this point
Copy the full SHA d3d3ea3View commit details -
[a64] Fix
VECTOR_CONVERT_F2I
rounding``` 4.2.2.4 Floating-Point Rounding and Conversion Instructions ... Floating-point conversions to integers (vctuxs, vctsxs) use round-toward-zero (truncate). ... ``` This passes all of the `vctuxs` and `vctsxs` unit tests
Configuration menu - View commit details
-
Copy full SHA for 7eca228 - Browse repository at this point
Copy the full SHA 7eca228View commit details -
[a64] Implement
PERMUTE_V128
(int16)Passes 'vmrghh' and `vmrglh` unit-tests
Configuration menu - View commit details
-
Copy full SHA for 684904c - Browse repository at this point
Copy the full SHA 684904cView commit details -
Use `FMADD` and `FMLA` Tests are the same, though now it should run a bit faster. The tests that fail are primarily denormals and other subtle precision issues it seems. Ex: ``` i> 00002358 - vmaddfp_7298_GEN !> 00002358 Register v4 assert failed: !> 00002358 Expected: v4 == [00000000, 00000000, 00000000, 00000000] !> 00002358 Actual: v4 == [000D000E, 00138014, 000E4CDC, 0018B34D] !> 00002358 TEST FAILED ``` Host-To-Guest and Guest-To-Host thunks should probably restore/preserve the FPCR to maintain these roundings.
Configuration menu - View commit details
-
Copy full SHA for b9d0752 - Browse repository at this point
Copy the full SHA b9d0752View commit details -
8 and 16 bit CNTLZ needs its bit-count fixed to its original element-type
Configuration menu - View commit details
-
Copy full SHA for bec248c - Browse repository at this point
Copy the full SHA bec248cView commit details -
[a64] Implement
kDebugInfoTraceFunctions
and `kDebugInfoTraceFuncti……onCoverage` Relies on armv8.1-a atomic features
Configuration menu - View commit details
-
Copy full SHA for c33f543 - Browse repository at this point
Copy the full SHA c33f543View commit details -
[a64] Fix
ATOMIC_COMPARE_EXCHANGE_I32
comparison typeThis fixes 32-bit atomic-compare-exchanges. The upper-half of the input register _must_ be clipped off. This fixes a deadlock in some games.
Configuration menu - View commit details
-
Copy full SHA for f1235be - Browse repository at this point
Copy the full SHA f1235beView commit details -
Configuration menu - View commit details
-
Copy full SHA for a542265 - Browse repository at this point
Copy the full SHA a542265View commit details -
[a64] Reduce function prolog/epilog to 16 bytes
Just need to store `fp` and `lr`
Configuration menu - View commit details
-
Copy full SHA for eb0736e - Browse repository at this point
Copy the full SHA eb0736eView commit details -
Configuration menu - View commit details
-
Copy full SHA for f7bd0c8 - Browse repository at this point
Copy the full SHA f7bd0c8View commit details -
[a64] Implement instruction stepping.
Uses `0x0000'dead` as an instructon-stepping sentinel value. Support for basic jumping instructions like `b`, `bl`, `br`, and `blr`.
Configuration menu - View commit details
-
Copy full SHA for c3efaaa - Browse repository at this point
Copy the full SHA c3efaaaView commit details -
Configuration menu - View commit details
-
Copy full SHA for a7ae117 - Browse repository at this point
Copy the full SHA a7ae117View commit details -
[a64] Optimize vector-constant generation
Uses MOVI to optimize some cases of constants rather than EOR. MOVI is a register-renaming idiom on many architectures.
Configuration menu - View commit details
-
Copy full SHA for e2d1e5d - Browse repository at this point
Copy the full SHA e2d1e5dView commit details -
[a64] Optimize memory-address calculation
The LSL can be embedded into the ADD to remove an additional instruction. What was `cset`+`lsl`+`add` should now just be `cset`+`add ... LSL 12`
Configuration menu - View commit details
-
Copy full SHA for 6e2910b - Browse repository at this point
Copy the full SHA 6e2910bView commit details -
Use pair-stores rather than singular-stores to write 32-bytes of data at a time.
Configuration menu - View commit details
-
Copy full SHA for 9b5a690 - Browse repository at this point
Copy the full SHA 9b5a690View commit details -
[a64] Implement
OPCODE_LOAD_CLOCk
clock_source_raw
Uses the `CNTVCT_EL0`-register and applies frequency scaling
Configuration menu - View commit details
-
Copy full SHA for 7c094dc - Browse repository at this point
Copy the full SHA 7c094dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 40d908b - Browse repository at this point
Copy the full SHA 40d908bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6478623 - Browse repository at this point
Copy the full SHA 6478623View commit details -
This is a very literal translation from the x64 code into ARM and may not be very optimized. Passes unit test save for a couple off-by-one errors.
Configuration menu - View commit details
-
Copy full SHA for 96d444d - Browse repository at this point
Copy the full SHA 96d444dView commit details -
[a64] Implement
LSE
andFP16C
detectionAdds two new flags for allowing the use of LSE and FP16C
Configuration menu - View commit details
-
Copy full SHA for 06daedf - Browse repository at this point
Copy the full SHA 06daedfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d72b40 - Browse repository at this point
Copy the full SHA 2d72b40View commit details -
Narrow-saturation instructions causes off-by-one rounding errors. Using the min+max+shuffle passes more unit tests
Configuration menu - View commit details
-
Copy full SHA for 4ff43ae - Browse repository at this point
Copy the full SHA 4ff43aeView commit details -
[a64] Optimize bulk VConst access with relative addressing
Load the pointer to the VConst table once, and use offsets from this base address from the underlying enum value. Reduces the amount of instructions for each VConst memory load.
Configuration menu - View commit details
-
Copy full SHA for fc1a13d - Browse repository at this point
Copy the full SHA fc1a13dView commit details -
[a64] Optimize constant vector byte-splats
Detect when all bytes are repeating and use `MOVI` when applicable
Configuration menu - View commit details
-
Copy full SHA for bf12583 - Browse repository at this point
Copy the full SHA bf12583View commit details -
[a64] Fix
OPCODE_SWIZZLE
register-aliasingIndices and non-const tables were using the same scratch-register
Configuration menu - View commit details
-
Copy full SHA for 63f31d5 - Browse repository at this point
Copy the full SHA 63f31d5View commit details -
[a64] Implement raw clock source
Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source. On my ThinkPad x13s, the raw clock source returns a tick-frequency of 19,200,000 while the platform clock source(QueryPerformanceFrequency) returns 10,000,000. Almost double the accuracy over the platform-clock!
Configuration menu - View commit details
-
Copy full SHA for 3b1a696 - Browse repository at this point
Copy the full SHA 3b1a696View commit details -
Configuration menu - View commit details
-
Copy full SHA for cba92a2 - Browse repository at this point
Copy the full SHA cba92a2View commit details -
[a64] Add arch-agnostic documentation configurations
Misses some during the first pass. Now the config files with mention a64 differences.
Configuration menu - View commit details
-
Copy full SHA for 7b9f791 - Browse repository at this point
Copy the full SHA 7b9f791View commit details -
Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero
Configuration menu - View commit details
-
Copy full SHA for 818a773 - Browse repository at this point
Copy the full SHA 818a773View commit details -
[a64] Implement
OPCODE_DID_SATURATE
This directly maps to the QC bit in the FPSR. Just have to make sure that the saturated instruction is the very last instruction(which is currently the case for stuff like VECTOR_ADD and such).
Configuration menu - View commit details
-
Copy full SHA for f830f79 - Browse repository at this point
Copy the full SHA f830f79View commit details -
[a64] Detect
MOVI
utilizations for vector-element splats(u8,u16,u32)The 64-bit cases uses a particular Replicated 8-bit immediate so something else will have to handle that This cases a lot of cases without having to touch memory. Does not catch cases of `1.0`(0x3f800000).
Configuration menu - View commit details
-
Copy full SHA for 8f6c0ad - Browse repository at this point
Copy the full SHA 8f6c0adView commit details -
[a64] Optimize constant-loads with
FMOV
`FMOV` encodes an 8-bit floating point immediate that can be used to accelerate the loading of certain constant floating point values between -31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall within this range and this code gets lots of hits in my testing. This is much more optimal than trying to load a 32/64-bit value in W0/X0 and moving it into an FP register.
Configuration menu - View commit details
-
Copy full SHA for 4655bc1 - Browse repository at this point
Copy the full SHA 4655bc1View commit details -
[a64] Implement armv8.0 atomic operations
Uses LSE when available, but provides an armv8.0 baseline implementation.
Configuration menu - View commit details
-
Copy full SHA for 151700d - Browse repository at this point
Copy the full SHA 151700dView commit details -
[a64] Remove x64 reference implementations
Removes all comments relating to x64 implementation details
Configuration menu - View commit details
-
Copy full SHA for 164f1e4 - Browse repository at this point
Copy the full SHA 164f1e4View commit details -
[a64] Implement
OPCODE_CACHE_CONTROL
`dc civac` causes an illegal-instruciton on Windows-ARM. This is likely as a security measure against cache-attacks. On Linux this instruction is trapped into an EL1 kernel function. Windows does not seem to have any user-mode cache-maintenance instructions available for data-cache(only instruction-cache via `FlushInstructionCache`). The closest thing we can do for now is a full data memory-barrier with `dsb ish`. Prefetches are implemented using `prfm pldl1keep, ...`.
Configuration menu - View commit details
-
Copy full SHA for 1127fd9 - Browse repository at this point
Copy the full SHA 1127fd9View commit details -
[a64] Fix out-of-bounds
OPCODE_VECTOR_SHL
(all-same) caseOut-of-bound shift-values are handled as modulo-element-size
Configuration menu - View commit details
-
Copy full SHA for 02edbd2 - Browse repository at this point
Copy the full SHA 02edbd2View commit details -
[a64] Use VectorCodeGenerator rather than CodeBlock+CodeGenerator
The emitter doesn't actually hold onto executable code, but just generates the assembly-data into a buffer for the currently-resolving function before placing it into a code-cache. When code gets pushed into the code-cache, it can just be copied from an `std::vector` and reset. The code-cache itself maintains the actual executable memory and stack-unwinding code and such. This also fixes a bunch of errornous relative-addressing glitches where relative addresses were calculated based on the address of the unused CodeBlock rather than being position-independent. `MOVP2R` in particular was generating different instructions depending on its distance from the code block when it should always just use `MOV` and not do any relative-address calculations since we can't predict where the actual instruction's offset will be(we cannot predict what the program counter will be). Oaknut probably needs a "position independent" policy or mode or something so that it avoids PC-relative instructions.
Configuration menu - View commit details
-
Copy full SHA for 2953e2e - Browse repository at this point
Copy the full SHA 2953e2eView commit details -
[a64] Replace instances of
MOV
+DUP-splats to
MOVI`These `MOV`->`DUP` splats can just be a singular `MOVI` instruction
Configuration menu - View commit details
-
Copy full SHA for 3acd0a3 - Browse repository at this point
Copy the full SHA 3acd0a3View commit details -
[a64] Optimize
OPCODE_SPLAT
byte-constantsByte-sized constants can utilize the `MOVI` instructions. This makes many cases such as zero-splats much faster since this encodes as just a register-rename(similar to `xor` on x64).
Configuration menu - View commit details
-
Copy full SHA for 539a03d - Browse repository at this point
Copy the full SHA 539a03dView commit details -
[a64] Optimize
OPCODE_SPLAT
withMOVI
/FMOV
Moves the `FMOV` constant functions into `a64_util` so it is available to other translation units. Optimize constant-splats with conditional use of `MOVI` and `FMOV`.
Configuration menu - View commit details
-
Copy full SHA for 9c8b067 - Browse repository at this point
Copy the full SHA 9c8b067View commit details -
[a64] Remove redundant
OPCODE_DOT_PRODUCT_{3,4}
lane-isolationThe last `FADDP` writes into an `S` register, which automatically masks all the other lanes to zero.
Configuration menu - View commit details
-
Copy full SHA for 9c572c3 - Browse repository at this point
Copy the full SHA 9c572c3View commit details -
[a64] Implement support for large stack sizes
The `SUB` instruction can only encode immediates in the form of `0xFFF` or `0xFFF000`. In the case that the stack size is greater than `0xFFF`, then just align the stack-size by `0x1000` to keep the bottom 12 bits clear.
Configuration menu - View commit details
-
Copy full SHA for a8b9cd8 - Browse repository at this point
Copy the full SHA a8b9cd8View commit details