merge main into amd-staging #805

ronlieb · 2025-12-10T02:20:10Z

No description provided.

…IMD (llvm#170163) This fixes a bug where firstprivate was ignored when the same variable had both firstprivate and lastprivate clauses in a do simd construct. What was broken: ``` integer :: a a = 10 !$omp do simd firstprivate(a) lastprivate(a) do i = 1, 1 print *, a ! Should print 10, but printed garbage/0 a = 20 end do !$omp end do simd print *, a ! Correctly prints 20 ``` Inside the loop, [a] wasn't being initialized from the firstprivate clause—it just had whatever uninitialized value was there. The fix: In genCompositeDoSimd(), we were using simdItemDSP to handle privatization for the whole loop nest. This only looked at SIMD clauses and missed the firstprivate from the DO part. Changed it to use wsloopItemDSP instead, which handles both DO clauses (firstprivate, lastprivate) correctly. One line change in OpenMP.cpp Tests added: Lowering test to check MLIR generation Runtime test to verify the actual values are correct <img width="740" height="440" alt="image" src="https://github.com/user-attachments/assets/fa911ea8-2024-4edf-b710-52c10659742e" /> Fixes llvm#168306 --------- Co-authored-by: Krish Gupta <[email protected]>

Explicitly create the high bit mask using getBitsSetFrom() instead of inverting an integer. This avoids relying on implicit truncation.

upgrade macOS version to latest stable version in github action. We run into a problem that timed `os_sync` API only becomes available in 14.4+.

NEEDS_TLSGD_TO_IE is only ever set when the symbol is preeptible, in which case addTpOffsetGotEntry will just add the symbol to the GOT and emit a symbolic tlsGotRel anyway, so there is no need to give it its own special case. As well as simplifying the code upstream, this is useful downstream for Morello, which doesn't really have a proper GD/IE-to-LE relaxation, and so for GD-to-IE can benefit from being able to use the optimisations addTpOffsetGotEntry has for non-preemptible symbols, rather than having to reimplement them here.

The test case build a binary from C++, and checks for the number of functions the PointerAuthCFIFixup pass runs on. This can change based on the platform. To account for this, the patch changes the number to a regex. The test failed when running on RHEL 9.

…ing on iOS (llvm#170816) iOS doesn't provide a libstdc++ dylib anymore, so we can remove the compatiblity check whether we can load the dylib.

ElaboratedType is no longer a thing.

Added by llvm#170772.

Fix a mistake introduced in llvm#163979: We should stick with the deprecated LLVMGetGlobalContext() API in this file, as getGlobalContextForCAPI() is a C++ API that is not available here.

…lvm#169748) Resolves llvm#169701. This PR extends the existing InstCombine operation which folds `tbl1` intrinsics to `shufflevector` if the mask operand is constant. Before this change, it only handled 64-bit `tbl1` intrinsics with no out-of-bounds indices. I've extended it to support both 64-bit and 128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and `tbx1`-`tbx4`, as long as at most two of the input operands are actually indexed into. For the purposes of `tbl`, we need a dummy vector of zeroes if there are any out-of-bounds indices, and for the purposes of `tbx`, we use the "fallback" operand. Both of those take up an operand for the purposes of `shufflevector`. This works a lot like llvm#169110, with some added complexity because we need to handle multiple operands. I raised a couple questions in that PR that still need to be answered: - Is it correct to check `IsA<UndefValue>` for each mask index, and set the output mask index to -1 if so? This is later folded to a poison value, and I'm not sure about the subtle differences between poison and undef and when you can substitute one for the other. As I mentioned in llvm#169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves this way when it comes to undef. - How can I write an Alive2 proof for this? It's very hard to find good documentation or tutorials about Alive2. As with llvm#169110, most of the regression test cases were generated using Claude. Everything else was written by me.

This fixes the buildbot failures from llvm#150267. I could not reproduce them locally but my intuition suggests that the -O3 option on the RUN line behaves incosistently on different hosts judging from the error logs. My intention was to run an integration test which will use llvm's globalopt pass, but there's no need actually. We have unittests in place for it.

…s. (llvm#170347) Extend the logic add in llvm#168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: llvm#170347

…vm#171436) "All tests passed" is too easily interpreted as every possible test was run and was fine. A lot of the time it means all the tests that didn't fail to build ran and were fine. Maybe the wording is still too subtle but at least it hints to the idea that the tests run might be fewer than if the build had no compilation errors.

This is still failing on some of the bots. Try bumping the limit again to see if this fixes things.

… result (llvm#170985) Fixes a crash in `ReorderCastOpsOnBroadcast` by ensuring the cast result is a `VectorType` before applying the pattern. A regression test has been added to mlir/test/Dialect/Vector/vector-sink.mlir. Fixes: llvm#126371

…port broadcast with low rank and scalar source input (llvm#170409) This PR extends XeGPU layout propagation and distribution for vector.broadcast operation. It relaxes the restriction of layout propagation to allow low-rank and scalar source input, and adds a pattern in sg-to-wi distribution to support the lowering.

…lvm#171213) We're considering modifying the ObjC runtime's class_rw_t structure to remove the firstSubclass and nextSiblingClass fields in some cases. LLDB is currently reading those but not actually using them. Stop doing that to avoid issues if they are removed by the runtime. rdar://166084122

I plan to use this for inline assembly "vd" contraints with mask types in a follow up patch. Due to the test changes I wanted to post this separately.

…th mask type. (llvm#171235) The inline assembly handling in SelectionDAG uses the first type for the register class as the type at the input/output of the inlineassembly. If this isn't the type for the surrounding DAG, it needs to be converted. nxv8i8 is the first type for the VR and VRNoV0 register classes. So we currently generate insert/extract_subvector and bitcasts to convert to/from nxv8i8. I believe some of the special casing we have for this in splitValueIntoRegisterParts and joinRegisterPartsIntoValue is causing us to also generate incorrect code for arguments with nxv16i4 types that should be any extended to nxv16i8. Instead we widen them to nxv32i4 and bitcast to nxv16i8. This patch uses VM and VMNoV0 for masks which has nxv64i1 as their first type. This means we will only emit an insert/extract_subvector without any bitcasts. This will allow me to fix splitValueIntoRegisterParts and joinRegisterPartsIntoValue to fix the nxv16i4 argument issue without breaking inline assembly. I may need to add more register classes to cover fractional LMULs, but I'm not sure yet.

This moves a couple of statement emitters that were incorrectly implemented in the middle of a switch statement where all cases in the final group are intended to fall through to a handler that emits an NYI error message. The placement of these implementations was causing some statement types that should have emitted the NYI error to instead go to a handler for a different statement type.

…171222) This adds stubs that issue NYI errors for any visitor that is present in the ClangIR incubator but missing in the upstream implementation. This will make it easier to find to correct locations to implement missing functionality.

Runs the `std::shared/unique_ptr` tests with PDB with two changes: - PDB uses the "full" name, so `std::string` is `std::basic_string<char, std::char_traits<char>, std::allocator<char>>` - The type of the pointer inside the shared/unique_ptr isn't the `element_type` typedef

…154735) This change introduces a new IR pass in the llc pipeline for NVPTX that transforms sequences of FMUL followed by FADD or FSUB into a single FMA instruction. Currently, all FMA folding for NVPTX occurs at the DAGCombine stage, which is too late for any IR-level passes that might want to optimize or analyze FMAs. By moving this transformation earlier into the IR phase, we enable more opportunities for FMA folding, including across basic blocks. Additionally, this new pass relies on the contract instruction level fast-math flag to perform these transformations, rather than depending on the -fp-contract=fast or -enable-unsafe-fp-math options passed to llc.

Fixed the argument types of the following intrinsics to match with the ISA: - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512, - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512 Fixes llvm#97271. Note that this is the last PR for the issue.

LLVM has pretty thorough support for `int128`, and it has started seeing some use. Even thouth we already have support for the `SPV_ALTERA_arbitrary_precision_integers` extension, the BE was oddly capping integer width to 64-bits. This patch adds partial support for lowering 128-bit integers to `OpTypeInt 128`. Some work remains to be done around legalisation support and validating constant uses (e.g. cases that get lowered to `OpSpecConstantOp`).

…and for OpenCL (llvm#167652) For extended imges insts amdgcn_image_sample_*_/gather4_* builtins, using 'x' in the builtin def so that it will take _Float16 for both HIP/C++ and OpenCL.

Added masked compress builtin in CIR. Note: This is my first PR to llvm. Looking forward to corrections --------- Co-authored-by: bhuvan1527 <[email protected]>

Removes the legacy HTML backend and replaces it with the Mustache backend.

… for spawning symbolizer (llvm#170809) Due to a legacy incompatibility with `atos`, we were allocating a pty whenever we spawned the symbolizer. This is no longer necessary and we can use a regular ol' pipe. This PR is split into two commits: - The first removes the pty allocation and replaces it with a pipe. This relocates the `CreateTwoHighNumberedPipes` call to be common to the `posix_spawn` and `StartSubprocess` path. - The second commit adds the `child_stdin_fd_` field to `SymbolizerProcess`, storing the read end of the stdin pipe. By holding on to this fd for the lifetime of the symbolizer, we are able to avoid getting SIGPIPE (which would occur when we write to a pipe whose read-end had been closed due to the death of the symbolizer). This will be very close to solving llvm#120915, but this PR is intentionally not touching the non-posix_spawn path. rdar://165894284

Test case for the mis-compile mentioned in llvm#166247 (comment) The issue is that we don't generate a runtime check even though it is required to vectorize.

Expose the HVXV81 abs, conversion, comparison, log2, negate and mixed subtract intrinsics so Clang can emit the new instructions.

… +1 out argument as a leak (llvm#161633) Make RetainPtrCtorAdoptChecker recognize an assignment to an +1 out argument so that it won't emit a memory leak warning.

Treat a weak Objective-C property, ivar, member variable, and local variable as safe.

…pattern (llvm#161019) Generalize the check for recognizing [[Obj alloc] init] to also recognize [allocObj() init]. We do this by utilizing isAllocInit function in RetainPtrCtorAdoptChecker.

When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in llvm#166247 (comment)

This PR is very similar to llvm#167235, but applied to `trn` rather than `zip`. There are two further differences: - The `@combine_v8i16_8first` and `@combine_v8i16_8firstundef` test cases in `arm64-zip.ll` didn't have equivalents in `arm64-trn.ll`, so this PR adds new test cases `@vtrni8_8first`, `@vtrni8_9first`, `@vtrni8_89first_undef`. - `AArch64TTIImpl::getShuffleCost` calls `isZIPMask`, but not `isTRNMask`. It relies on `Kind == TTI::SK_Transpose` instead (which in turn is based on `ShuffleVectorInst::isTransposeMask` through `improveShuffleKindFromMask`). Therefore, this PR does not itself influence the slp-vectorizer. In a follow-up PR, I intend to override `AArch64TTIImpl::improveShuffleKindFromMask` to ensure we get `ShuffleKind::SK_Transpose` based on the new `isTRNMask`. In fact, that follow-up change is the actual motivation for this PR, as it will result in ```C++ int8x16_t g(int8_t x) { return (int8x16_t) { 0, x, 1, x, 2, x, 3, x, 4, x, 5, x, 6, x, 7, x }; } ``` from llvm#137447 being optimised by the slp-vectorizer.

This patch removed some source files that were explicitly enumerated in the bazel files. Remove them so that the build passes.

…lvm#170877) Handle xsave/xrstor family of X86 builtins in ClangIR Part of llvm#167752 --------- Signed-off-by: Medha Tiwari <[email protected]>

This adds the minimum support for C++ data member pointer variables.

uint64_t and size_t are not the same across all platforms. This was causing build failures when building this file for wasm: llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1323:19: error: out-of-line definition of 'resolveEntry' does not match any declaration in '(anonymous namespace)::AttrTypeReader' 1323 | T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>> &entries, size_t index, | ^~~~~~~~~~~~ third_party/llvm/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:851:7: note: AttrTypeReader defined here 851 | class AttrTypeReader { | ^~~~~~~~~~~~~~ 1 error generated. Use uint64_t everywhere to ensure portability.

…path (llvm#171508) llvm#170809 added the child_stdin_fd_ field on SymbolizerProcess to allow the parent process to hold on to the read in of the child's stdin pipe. This was to avoid SIGPIPE. However, the `StartSubprocess` path still closes the stdin fd in the parent here: https://github.com/llvm/llvm-project/blob/7f5ed91684c808444ede24eb01ad9af73b5806e5/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp#L525-L535 This could cause a double-close of this fd (problematic in the case of fd reuse). This moves the `child_stdin_fd_` field to only be initialized on the posix_spawn path. This should ensure llvm#170809 only truly affects Darwin.

Instead of getting a lock and then checking/modifying the Initialization variable, make it an atomic. Doing this, we can remove one of the mutexes in shared TSDs and avoid any potential lock contention in both shared TSDs and exclusive TSDs if multiple threads do allocation operations at the same time. Add two new tests that make sure no crashes occur if multiple threads try and do allocations at the same time.

This reverts commit 8a115b6. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'

) They cannot be consolidated, as WidenPHI is not a header PHI, while ActtiveLaneMaskPHI is.

z1-cciauto · 2025-12-10T02:21:26Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3209

… in DO SIMD (llvm#170163)" This reverts commit 748e7af.

z1-cciauto · 2025-12-10T08:17:10Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3214

KrxGu and others added 30 commits December 9, 2025 15:10

[AtomicExpand] Use getSigned() for negative value

b0bd8bd

[Hexagon] Use getSigned() for signed value

80fc9bc

[BypassSlowDivision] Explicitly create bit mask

cf9ba40

Explicitly create the high bit mask using getBitsSetFrom() instead of inverting an integer. This avoids relying on implicit truncation.

[libc][CI] update macOS version in workflow configuration (llvm#171228)

005ef5c

upgrade macOS version to latest stable version in github action. We run into a problem that timed `os_sync` API only becomes available in 14.4+.

[LSR] Use getSigned() for negated immediate

6960b63

[libc++] Don't try to be compatible with libstdc++ in __libcpp_refstr…

b2ddb90

…ing on iOS (llvm#170816) iOS doesn't provide a libstdc++ dylib anymore, so we can remove the compatiblity check whether we can load the dylib.

Update the NATVIS file

6b58449

ElaboratedType is no longer a thing.

[llvm][docs] Add a release note for LLDB "version -v"

b3a5870

Added by llvm#170772.

[OCaml] Fix build

c66eb25

Fix a mistake introduced in llvm#163979: We should stick with the deprecated LLVMGetGlobalContext() API in this file, as getGlobalContextForCAPI() is a C++ API that is not available here.

[compiler-rt] Try bumping soft_rss_limit again (llvm#171469)

a033183

This is still failing on some of the bots. Try bumping the limit again to see if this fixes things.

[RISCV] Add VMNoV0 register class with only the VMaskVTs. (llvm#171231)

2e16f24

I plan to use this for inline assembly "vd" contraints with mask types in a follow up patch. Due to the test changes I wanted to post this separately.

[AMDGPU] Modifies builtin def to take _Float16('x') for both HIP/C++ …

04a5ee6

…and for OpenCL (llvm#167652) For extended imges insts amdgcn_image_sample_*_/gather4_* builtins, using 'x' in the builtin def so that it will take _Float16 for both HIP/C++ and OpenCL.

[CIR][CIRGen][Builtin][X86] Masked compress Intrinsics (llvm#169582)

fa60765

Added masked compress builtin in CIR. Note: This is my first PR to llvm. Looking forward to corrections --------- Co-authored-by: bhuvan1527 <[email protected]>

evelez7 and others added 20 commits December 9, 2025 11:50

[clang-doc] Replace HTML generation with Mustache backend (llvm#170199)

24117f7

Removes the legacy HTML backend and replaces it with the Mustache backend.

[gn build] Port 24117f7

4bff9fd

[LV] Add test with threshold=0 and metadata forcing vectorization.

7a5e2c9

Test case for the mis-compile mentioned in llvm#166247 (comment) The issue is that we don't generate a runtime check even though it is required to vectorize.

[Hexagon] Add HVX V81 builtins (llvm#170680)

b3d05e6

Expose the HVXV81 abs, conversion, comparison, log2, negate and mixed subtract intrinsics so Clang can emit the new instructions.

[alpha.webkit.RetainPtrCtorAdoptChecker] Don't treat assignment to an…

0eb00ef

… +1 out argument as a leak (llvm#161633) Make RetainPtrCtorAdoptChecker recognize an assignment to an +1 out argument so that it won't emit a memory leak warning.

[WebKit checkers] Treat a weak property / variable as safe (llvm#163689)

f9326ff

Treat a weak Objective-C property, ivar, member variable, and local variable as safe.

[alpha.webkit.UnretainedCallArgsChecker] Recognize [allocObj() init] …

06f0758

…pattern (llvm#161019) Generalize the check for recognizing [[Obj alloc] init] to also recognize [allocObj() init]. We do this by utilizing isAllocInit function in RetainPtrCtorAdoptChecker.

[bazel] Port 24117f7 (llvm#171497)

0895163

This patch removed some source files that were explicitly enumerated in the bazel files. Remove them so that the build passes.

[CIR][X86] Implement xsave/xrstor builtins Fixes part of llvm#167752 (l…

019a294

…lvm#170877) Handle xsave/xrstor family of X86 builtins in ClangIR Part of llvm#167752 --------- Signed-off-by: Medha Tiwari <[email protected]>

[CIR] Add basic support for data member pointers (llvm#170939)

87bf5ee

This adds the minimum support for C++ data member pointer variables.

[clang-doc] Do not serialize empty text comments (llvm#169087)

d86bc19

[VPlan] Strip TODO to consolidate (ActiveLaneMask|Widen)PHI (llvm#171392

3310c0b

) They cannot be consolidated, as WidenPHI is not a header PHI, while ActtiveLaneMaskPHI is.

merge main into amd-staging

c99c3bc

ronlieb requested review from a team and dpalermo December 10, 2025 02:20

ronlieb requested a review from nicolasvasilache as a code owner December 10, 2025 02:20

ronlieb removed the request for review from nicolasvasilache December 10, 2025 02:20

dpalermo approved these changes Dec 10, 2025

View reviewed changes

Revert "[flang][OpenMP] Fix firstprivate not working with lastprivate…

0f318d9

… in DO SIMD (llvm#170163)" This reverts commit 748e7af.

z1-cciauto merged commit 704a42a into amd-staging Dec 10, 2025
20 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251209195003 branch December 10, 2025 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #805

merge main into amd-staging #805

Uh oh!

ronlieb commented Dec 10, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge main into amd-staging #805

merge main into amd-staging #805

Uh oh!

Conversation

ronlieb commented Dec 10, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants