forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 64 commits into
amd-staging
from
amd/merge/upstream_merge_20251209195003
Dec 10, 2025
Merged
merge main into amd-staging #805
z1-cciauto
merged 64 commits into
amd-staging
from
amd/merge/upstream_merge_20251209195003
Dec 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…IMD (llvm#170163) This fixes a bug where firstprivate was ignored when the same variable had both firstprivate and lastprivate clauses in a do simd construct. What was broken: ``` integer :: a a = 10 !$omp do simd firstprivate(a) lastprivate(a) do i = 1, 1 print *, a ! Should print 10, but printed garbage/0 a = 20 end do !$omp end do simd print *, a ! Correctly prints 20 ``` Inside the loop, [a] wasn't being initialized from the firstprivate clause—it just had whatever uninitialized value was there. The fix: In genCompositeDoSimd(), we were using simdItemDSP to handle privatization for the whole loop nest. This only looked at SIMD clauses and missed the firstprivate from the DO part. Changed it to use wsloopItemDSP instead, which handles both DO clauses (firstprivate, lastprivate) correctly. One line change in OpenMP.cpp Tests added: Lowering test to check MLIR generation Runtime test to verify the actual values are correct <img width="740" height="440" alt="image" src="https://github.com/user-attachments/assets/fa911ea8-2024-4edf-b710-52c10659742e" /> Fixes llvm#168306 --------- Co-authored-by: Krish Gupta <[email protected]>
Explicitly create the high bit mask using getBitsSetFrom() instead of inverting an integer. This avoids relying on implicit truncation.
upgrade macOS version to latest stable version in github action. We run into a problem that timed `os_sync` API only becomes available in 14.4+.
NEEDS_TLSGD_TO_IE is only ever set when the symbol is preeptible, in which case addTpOffsetGotEntry will just add the symbol to the GOT and emit a symbolic tlsGotRel anyway, so there is no need to give it its own special case. As well as simplifying the code upstream, this is useful downstream for Morello, which doesn't really have a proper GD/IE-to-LE relaxation, and so for GD-to-IE can benefit from being able to use the optimisations addTpOffsetGotEntry has for non-preemptible symbols, rather than having to reimplement them here.
The test case build a binary from C++, and checks for the number of functions the PointerAuthCFIFixup pass runs on. This can change based on the platform. To account for this, the patch changes the number to a regex. The test failed when running on RHEL 9.
…ing on iOS (llvm#170816) iOS doesn't provide a libstdc++ dylib anymore, so we can remove the compatiblity check whether we can load the dylib.
ElaboratedType is no longer a thing.
Fix a mistake introduced in llvm#163979: We should stick with the deprecated LLVMGetGlobalContext() API in this file, as getGlobalContextForCAPI() is a C++ API that is not available here.
…lvm#169748) Resolves llvm#169701. This PR extends the existing InstCombine operation which folds `tbl1` intrinsics to `shufflevector` if the mask operand is constant. Before this change, it only handled 64-bit `tbl1` intrinsics with no out-of-bounds indices. I've extended it to support both 64-bit and 128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and `tbx1`-`tbx4`, as long as at most two of the input operands are actually indexed into. For the purposes of `tbl`, we need a dummy vector of zeroes if there are any out-of-bounds indices, and for the purposes of `tbx`, we use the "fallback" operand. Both of those take up an operand for the purposes of `shufflevector`. This works a lot like llvm#169110, with some added complexity because we need to handle multiple operands. I raised a couple questions in that PR that still need to be answered: - Is it correct to check `IsA<UndefValue>` for each mask index, and set the output mask index to -1 if so? This is later folded to a poison value, and I'm not sure about the subtle differences between poison and undef and when you can substitute one for the other. As I mentioned in llvm#169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves this way when it comes to undef. - How can I write an Alive2 proof for this? It's very hard to find good documentation or tutorials about Alive2. As with llvm#169110, most of the regression test cases were generated using Claude. Everything else was written by me.
This fixes the buildbot failures from llvm#150267. I could not reproduce them locally but my intuition suggests that the -O3 option on the RUN line behaves incosistently on different hosts judging from the error logs. My intention was to run an integration test which will use llvm's globalopt pass, but there's no need actually. We have unittests in place for it.
…s. (llvm#170347) Extend the logic add in llvm#168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: llvm#170347
…vm#171436) "All tests passed" is too easily interpreted as every possible test was run and was fine. A lot of the time it means all the tests that didn't fail to build ran and were fine. Maybe the wording is still too subtle but at least it hints to the idea that the tests run might be fewer than if the build had no compilation errors.
This is still failing on some of the bots. Try bumping the limit again to see if this fixes things.
… result (llvm#170985) Fixes a crash in `ReorderCastOpsOnBroadcast` by ensuring the cast result is a `VectorType` before applying the pattern. A regression test has been added to mlir/test/Dialect/Vector/vector-sink.mlir. Fixes: llvm#126371
…port broadcast with low rank and scalar source input (llvm#170409) This PR extends XeGPU layout propagation and distribution for vector.broadcast operation. It relaxes the restriction of layout propagation to allow low-rank and scalar source input, and adds a pattern in sg-to-wi distribution to support the lowering.
…lvm#171213) We're considering modifying the ObjC runtime's class_rw_t structure to remove the firstSubclass and nextSiblingClass fields in some cases. LLDB is currently reading those but not actually using them. Stop doing that to avoid issues if they are removed by the runtime. rdar://166084122
I plan to use this for inline assembly "vd" contraints with mask types in a follow up patch. Due to the test changes I wanted to post this separately.
…th mask type. (llvm#171235) The inline assembly handling in SelectionDAG uses the first type for the register class as the type at the input/output of the inlineassembly. If this isn't the type for the surrounding DAG, it needs to be converted. nxv8i8 is the first type for the VR and VRNoV0 register classes. So we currently generate insert/extract_subvector and bitcasts to convert to/from nxv8i8. I believe some of the special casing we have for this in splitValueIntoRegisterParts and joinRegisterPartsIntoValue is causing us to also generate incorrect code for arguments with nxv16i4 types that should be any extended to nxv16i8. Instead we widen them to nxv32i4 and bitcast to nxv16i8. This patch uses VM and VMNoV0 for masks which has nxv64i1 as their first type. This means we will only emit an insert/extract_subvector without any bitcasts. This will allow me to fix splitValueIntoRegisterParts and joinRegisterPartsIntoValue to fix the nxv16i4 argument issue without breaking inline assembly. I may need to add more register classes to cover fractional LMULs, but I'm not sure yet.
This moves a couple of statement emitters that were incorrectly implemented in the middle of a switch statement where all cases in the final group are intended to fall through to a handler that emits an NYI error message. The placement of these implementations was causing some statement types that should have emitted the NYI error to instead go to a handler for a different statement type.
…171222) This adds stubs that issue NYI errors for any visitor that is present in the ClangIR incubator but missing in the upstream implementation. This will make it easier to find to correct locations to implement missing functionality.
Runs the `std::shared/unique_ptr` tests with PDB with two changes: - PDB uses the "full" name, so `std::string` is `std::basic_string<char, std::char_traits<char>, std::allocator<char>>` - The type of the pointer inside the shared/unique_ptr isn't the `element_type` typedef
…154735) This change introduces a new IR pass in the llc pipeline for NVPTX that transforms sequences of FMUL followed by FADD or FSUB into a single FMA instruction. Currently, all FMA folding for NVPTX occurs at the DAGCombine stage, which is too late for any IR-level passes that might want to optimize or analyze FMAs. By moving this transformation earlier into the IR phase, we enable more opportunities for FMA folding, including across basic blocks. Additionally, this new pass relies on the contract instruction level fast-math flag to perform these transformations, rather than depending on the -fp-contract=fast or -enable-unsafe-fp-math options passed to llc.
Fixed the argument types of the following intrinsics to match with the ISA: - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512, - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512 Fixes llvm#97271. Note that this is the last PR for the issue.
LLVM has pretty thorough support for `int128`, and it has started seeing some use. Even thouth we already have support for the `SPV_ALTERA_arbitrary_precision_integers` extension, the BE was oddly capping integer width to 64-bits. This patch adds partial support for lowering 128-bit integers to `OpTypeInt 128`. Some work remains to be done around legalisation support and validating constant uses (e.g. cases that get lowered to `OpSpecConstantOp`).
…and for OpenCL (llvm#167652) For extended imges insts amdgcn_image_sample_*_/gather4_* builtins, using 'x' in the builtin def so that it will take _Float16 for both HIP/C++ and OpenCL.
Added masked compress builtin in CIR. Note: This is my first PR to llvm. Looking forward to corrections --------- Co-authored-by: bhuvan1527 <[email protected]>
Removes the legacy HTML backend and replaces it with the Mustache backend.
… for spawning symbolizer (llvm#170809) Due to a legacy incompatibility with `atos`, we were allocating a pty whenever we spawned the symbolizer. This is no longer necessary and we can use a regular ol' pipe. This PR is split into two commits: - The first removes the pty allocation and replaces it with a pipe. This relocates the `CreateTwoHighNumberedPipes` call to be common to the `posix_spawn` and `StartSubprocess` path. - The second commit adds the `child_stdin_fd_` field to `SymbolizerProcess`, storing the read end of the stdin pipe. By holding on to this fd for the lifetime of the symbolizer, we are able to avoid getting SIGPIPE (which would occur when we write to a pipe whose read-end had been closed due to the death of the symbolizer). This will be very close to solving llvm#120915, but this PR is intentionally not touching the non-posix_spawn path. rdar://165894284
Test case for the mis-compile mentioned in llvm#166247 (comment) The issue is that we don't generate a runtime check even though it is required to vectorize.
Expose the HVXV81 abs, conversion, comparison, log2, negate and mixed subtract intrinsics so Clang can emit the new instructions.
… +1 out argument as a leak (llvm#161633) Make RetainPtrCtorAdoptChecker recognize an assignment to an +1 out argument so that it won't emit a memory leak warning.
Treat a weak Objective-C property, ivar, member variable, and local variable as safe.
…pattern (llvm#161019) Generalize the check for recognizing [[Obj alloc] init] to also recognize [allocObj() init]. We do this by utilizing isAllocInit function in RetainPtrCtorAdoptChecker.
When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in llvm#166247 (comment)
This PR is very similar to llvm#167235, but applied to `trn` rather than `zip`. There are two further differences: - The `@combine_v8i16_8first` and `@combine_v8i16_8firstundef` test cases in `arm64-zip.ll` didn't have equivalents in `arm64-trn.ll`, so this PR adds new test cases `@vtrni8_8first`, `@vtrni8_9first`, `@vtrni8_89first_undef`. - `AArch64TTIImpl::getShuffleCost` calls `isZIPMask`, but not `isTRNMask`. It relies on `Kind == TTI::SK_Transpose` instead (which in turn is based on `ShuffleVectorInst::isTransposeMask` through `improveShuffleKindFromMask`). Therefore, this PR does not itself influence the slp-vectorizer. In a follow-up PR, I intend to override `AArch64TTIImpl::improveShuffleKindFromMask` to ensure we get `ShuffleKind::SK_Transpose` based on the new `isTRNMask`. In fact, that follow-up change is the actual motivation for this PR, as it will result in ```C++ int8x16_t g(int8_t x) { return (int8x16_t) { 0, x, 1, x, 2, x, 3, x, 4, x, 5, x, 6, x, 7, x }; } ``` from llvm#137447 being optimised by the slp-vectorizer.
This patch removed some source files that were explicitly enumerated in the bazel files. Remove them so that the build passes.
…lvm#170877) Handle xsave/xrstor family of X86 builtins in ClangIR Part of llvm#167752 --------- Signed-off-by: Medha Tiwari <[email protected]>
This adds the minimum support for C++ data member pointer variables.
uint64_t and size_t are not the same across all platforms. This was
causing build failures when building this file for wasm:
llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1323:19: error:
out-of-line definition of 'resolveEntry' does not match any declaration
in '(anonymous namespace)::AttrTypeReader'
1323 | T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>>
&entries, size_t index,
| ^~~~~~~~~~~~
third_party/llvm/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:851:7:
note: AttrTypeReader defined here
851 | class AttrTypeReader {
| ^~~~~~~~~~~~~~
1 error generated.
Use uint64_t everywhere to ensure portability.
…path (llvm#171508) llvm#170809 added the child_stdin_fd_ field on SymbolizerProcess to allow the parent process to hold on to the read in of the child's stdin pipe. This was to avoid SIGPIPE. However, the `StartSubprocess` path still closes the stdin fd in the parent here: https://github.com/llvm/llvm-project/blob/7f5ed91684c808444ede24eb01ad9af73b5806e5/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp#L525-L535 This could cause a double-close of this fd (problematic in the case of fd reuse). This moves the `child_stdin_fd_` field to only be initialized on the posix_spawn path. This should ensure llvm#170809 only truly affects Darwin.
Instead of getting a lock and then checking/modifying the Initialization variable, make it an atomic. Doing this, we can remove one of the mutexes in shared TSDs and avoid any potential lock contention in both shared TSDs and exclusive TSDs if multiple threads do allocation operations at the same time. Add two new tests that make sure no crashes occur if multiple threads try and do allocations at the same time.
This reverts commit 8a115b6. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'
Collaborator
dpalermo
approved these changes
Dec 10, 2025
… in DO SIMD (llvm#170163)" This reverts commit 748e7af.
Collaborator
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.