Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Dec 10, 2025

No description provided.

KrxGu and others added 30 commits December 9, 2025 15:10
…IMD (llvm#170163)

This fixes a bug where firstprivate was ignored when the same variable
had both firstprivate and lastprivate clauses in a do simd construct.

What was broken:
```
integer :: a
a = 10
!$omp do simd firstprivate(a) lastprivate(a)
do i = 1, 1
   print *, a  ! Should print 10, but printed garbage/0
   a = 20
end do
!$omp end do simd
print *, a  ! Correctly prints 20
```

Inside the loop, [a] wasn't being initialized from the firstprivate
clause—it just had whatever uninitialized value was there.

The fix:

In genCompositeDoSimd(), we were using simdItemDSP to handle
privatization for the whole loop nest. This only looked at SIMD clauses
and missed the firstprivate from the DO part. Changed it to use
wsloopItemDSP instead, which handles both DO clauses (firstprivate,
lastprivate) correctly.

One line change in OpenMP.cpp

Tests added:

Lowering test to check MLIR generation
Runtime test to verify the actual values are correct
<img width="740" height="440" alt="image"
src="https://github.com/user-attachments/assets/fa911ea8-2024-4edf-b710-52c10659742e"
/>


Fixes llvm#168306

---------

Co-authored-by: Krish Gupta <[email protected]>
Explicitly create the high bit mask using getBitsSetFrom() instead
of inverting an integer. This avoids relying on implicit
truncation.
upgrade macOS version to latest stable version in github action. We run
into a problem that timed `os_sync` API only becomes available in 14.4+.
NEEDS_TLSGD_TO_IE is only ever set when the symbol is preeptible, in
which case addTpOffsetGotEntry will just add the symbol to the GOT and
emit a symbolic tlsGotRel anyway, so there is no need to give it its own
special case.

As well as simplifying the code upstream, this is useful downstream for
Morello, which doesn't really have a proper GD/IE-to-LE relaxation, and
so for GD-to-IE can benefit from being able to use the optimisations
addTpOffsetGotEntry has for non-preemptible symbols, rather than having
to reimplement them here.
The test case build a binary from C++, and checks for the number of
functions the PointerAuthCFIFixup pass runs on.
This can change based on the platform. To account for this, the patch
changes the number to a regex.

The test failed when running on RHEL 9.
…ing on iOS (llvm#170816)

iOS doesn't provide a libstdc++ dylib anymore, so we can remove the
compatiblity check whether we can load the dylib.
ElaboratedType is no longer a thing.
Fix a mistake introduced in llvm#163979:

We should stick with the deprecated LLVMGetGlobalContext() API
in this file, as getGlobalContextForCAPI() is a C++ API that is
not available here.
…lvm#169748)

Resolves llvm#169701.

This PR extends the existing InstCombine operation which folds `tbl1`
intrinsics to `shufflevector` if the mask operand is constant. Before
this change, it only handled 64-bit `tbl1` intrinsics with no
out-of-bounds indices. I've extended it to support both 64-bit and
128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and
`tbx1`-`tbx4`, as long as at most two of the input operands are actually
indexed into.

For the purposes of `tbl`, we need a dummy vector of zeroes if there are
any out-of-bounds indices, and for the purposes of `tbx`, we use the
"fallback" operand. Both of those take up an operand for the purposes of
`shufflevector`.

This works a lot like llvm#169110,
with some added complexity because we need to handle multiple operands.
I raised a couple questions in that PR that still need to be answered:
- Is it correct to check `IsA<UndefValue>` for each mask index, and set
the output mask index to -1 if so? This is later folded to a poison
value, and I'm not sure about the subtle differences between poison and
undef and when you can substitute one for the other. As I mentioned in
llvm#169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves
this way when it comes to undef.
- How can I write an Alive2 proof for this? It's very hard to find good
documentation or tutorials about Alive2.

As with llvm#169110, most of the regression test cases were generated using
Claude. Everything else was written by me.
This fixes the buildbot failures from
llvm#150267.

I could not reproduce them locally but my intuition suggests that the
-O3 option on the RUN line behaves incosistently on different hosts
judging from the error logs.

My intention was to run an integration test which will use llvm's
globalopt pass, but there's no need actually. We have unittests in place
for it.
…s. (llvm#170347)

Extend the logic add in llvm#168771
to also allow sinking stores past stores in the same noalias set by
checking if we can prove no-alias via the distance between accesses,
checked via SCEV.

PR: llvm#170347
…vm#171436)

"All tests passed" is too easily interpreted as every possible test was
run and was fine. A lot of the time it means all the tests that didn't
fail to build ran and were fine.

Maybe the wording is still too subtle but at least it hints to the idea
that the tests run might be fewer than if the build had no compilation
errors.
This is still failing on some of the bots. Try bumping the limit again
to see if this fixes things.
… result (llvm#170985)

Fixes a crash in `ReorderCastOpsOnBroadcast` by ensuring the cast result
is a `VectorType` before applying the pattern.
A regression test has been added to
mlir/test/Dialect/Vector/vector-sink.mlir.

Fixes: llvm#126371
…port broadcast with low rank and scalar source input (llvm#170409)

This PR extends XeGPU layout propagation and distribution for
vector.broadcast operation.
It relaxes the restriction of layout propagation to allow low-rank and
scalar source input, and adds a pattern in sg-to-wi distribution to
support the lowering.
…lvm#171213)

We're considering modifying the ObjC runtime's class_rw_t structure to
remove the firstSubclass and nextSiblingClass fields in some cases. LLDB
is currently reading those but not actually using them. Stop doing that
to avoid issues if they are removed by the runtime.

rdar://166084122
I plan to use this for inline assembly "vd" contraints with mask types
in a follow up patch. Due to the test changes I wanted to post this
separately.
…th mask type. (llvm#171235)

The inline assembly handling in SelectionDAG uses the first type
for the register class as the type at the input/output of the
inlineassembly. If this isn't the type for the surrounding DAG,
it needs to be converted.

nxv8i8 is the first type for the VR and VRNoV0 register classes.
So we currently generate insert/extract_subvector and bitcasts to
convert to/from nxv8i8.

I believe some of the special casing we have for this in
splitValueIntoRegisterParts and joinRegisterPartsIntoValue is causing
us to also generate incorrect code for arguments with nxv16i4 types
that should be any extended to nxv16i8. Instead we widen them to nxv32i4
and bitcast to nxv16i8.
    
This patch uses VM and VMNoV0 for masks which has nxv64i1 as their
first type. This means we will only emit an insert/extract_subvector
without any bitcasts. This will allow me to fix
splitValueIntoRegisterParts and joinRegisterPartsIntoValue to fix the
nxv16i4 argument issue without breaking inline assembly.
    
I may need to add more register classes to cover fractional LMULs,
but I'm not sure yet.
This moves a couple of statement emitters that were incorrectly
implemented in the middle of a switch statement where all cases in the
final group are intended to fall through to a handler that emits an NYI
error message. The placement of these implementations was causing some
statement types that should have emitted the NYI error to instead go to
a handler for a different statement type.
…171222)

This adds stubs that issue NYI errors for any visitor that is present in
the ClangIR incubator but missing in the upstream implementation. This
will make it easier to find to correct locations to implement missing
functionality.
Runs the `std::shared/unique_ptr` tests with PDB with two changes:

- PDB uses the "full" name, so `std::string` is `std::basic_string<char,
std::char_traits<char>, std::allocator<char>>`
- The type of the pointer inside the shared/unique_ptr isn't the
`element_type` typedef
…154735)

This change introduces a new IR pass in the llc pipeline for NVPTX that
transforms sequences of FMUL followed by FADD or FSUB into a single FMA
instruction.

Currently, all FMA folding for NVPTX occurs at the DAGCombine stage,
which is too late for any IR-level passes that might want to optimize or
analyze FMAs. By moving this transformation earlier into the IR phase,
we enable more opportunities for FMA folding, including across basic
blocks.

Additionally, this new pass relies on the contract instruction level
fast-math flag to perform these transformations, rather than depending
on the -fp-contract=fast or -enable-unsafe-fp-math options passed to
llc.
Fixed the argument types of the following intrinsics to match with the
ISA:
 - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
 - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512
 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512

Fixes llvm#97271. Note that this is the last PR for the issue.
LLVM has pretty thorough support for `int128`, and it has started seeing
some use. Even thouth we already have support for the
`SPV_ALTERA_arbitrary_precision_integers` extension, the BE was oddly
capping integer width to 64-bits. This patch adds partial support for
lowering 128-bit integers to `OpTypeInt 128`. Some work remains to be
done around legalisation support and validating constant uses (e.g.
cases that get lowered to `OpSpecConstantOp`).
…and for OpenCL (llvm#167652)

For extended imges insts amdgcn_image_sample_*_/gather4_* builtins,
using 'x' in the builtin def so that it will take _Float16 for both
HIP/C++ and OpenCL.
Added masked compress builtin in CIR.
Note: This is my first PR to llvm. Looking forward to corrections

---------

Co-authored-by: bhuvan1527 <[email protected]>
evelez7 and others added 20 commits December 9, 2025 11:50
Removes the legacy HTML backend and replaces it with the Mustache
backend.
… for spawning symbolizer (llvm#170809)

Due to a legacy incompatibility with `atos`, we were allocating a pty
whenever we spawned the symbolizer. This is no longer necessary and we
can use a regular ol' pipe.

This PR is split into two commits:
- The first removes the pty allocation and replaces it with a pipe. This
relocates the `CreateTwoHighNumberedPipes` call to be common to the
`posix_spawn` and `StartSubprocess` path.
- The second commit adds the `child_stdin_fd_` field to
`SymbolizerProcess`, storing the read end of the stdin pipe. By holding
on to this fd for the lifetime of the symbolizer, we are able to avoid
getting SIGPIPE (which would occur when we write to a pipe whose
read-end had been closed due to the death of the symbolizer). This will
be very close to solving llvm#120915, but this PR is intentionally not
touching the non-posix_spawn path.

rdar://165894284
Test case for the mis-compile mentioned in
llvm#166247 (comment)

The issue is that we don't generate a runtime check even though it is
required to vectorize.
Expose the HVXV81 abs, conversion, comparison, log2, negate and mixed
subtract intrinsics so Clang can emit the new instructions.
… +1 out argument as a leak (llvm#161633)

Make RetainPtrCtorAdoptChecker recognize an assignment to an +1 out
argument so that it won't emit a memory leak warning.
Treat a weak Objective-C property, ivar, member variable, and local
variable as safe.
…pattern (llvm#161019)

Generalize the check for recognizing [[Obj alloc] init] to also
recognize [allocObj() init]. We do this by utilizing isAllocInit
function in RetainPtrCtorAdoptChecker.
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.

Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.

Fixes the mis-compile mentioned in
llvm#166247 (comment)
This PR is very similar to llvm#167235, but applied to `trn` rather than
`zip`. There are two further differences:
- The `@combine_v8i16_8first` and `@combine_v8i16_8firstundef` test
  cases in `arm64-zip.ll` didn't have equivalents in `arm64-trn.ll`, so 
  this PR adds new test cases `@vtrni8_8first`, `@vtrni8_9first`, 
  `@vtrni8_89first_undef`.
- `AArch64TTIImpl::getShuffleCost` calls `isZIPMask`, but not
  `isTRNMask`. It relies on `Kind == TTI::SK_Transpose` instead (which 
  in turn is based on `ShuffleVectorInst::isTransposeMask` through
  `improveShuffleKindFromMask`).
Therefore, this PR does not itself influence the slp-vectorizer. In a 
follow-up PR, I intend to override 
`AArch64TTIImpl::improveShuffleKindFromMask` to ensure we get
`ShuffleKind::SK_Transpose` based on the new `isTRNMask`. In fact, that
follow-up change is the actual motivation for this PR, as it will result
in
  ```C++
  int8x16_t g(int8_t x)
  {
    return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                         4, x, 5, x, 6, x, 7, x };
  }
  ```
  from llvm#137447 being optimised by the slp-vectorizer.
This patch removed some source files that were explicitly enumerated in
the bazel files. Remove them so that the build passes.
…lvm#170877)

Handle xsave/xrstor family of X86 builtins in ClangIR

Part of llvm#167752

---------

Signed-off-by: Medha Tiwari <[email protected]>
This adds the minimum support for C++ data member pointer variables.
uint64_t and size_t are not the same across all platforms. This was
causing build failures when building this file for wasm:

llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1323:19: error:
out-of-line definition of 'resolveEntry' does not match any declaration
in '(anonymous namespace)::AttrTypeReader'
1323 | T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>>
&entries, size_t index,
      |                   ^~~~~~~~~~~~

third_party/llvm/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:851:7:
note: AttrTypeReader defined here
  851 | class AttrTypeReader {
      |       ^~~~~~~~~~~~~~
1 error generated.

Use uint64_t everywhere to ensure portability.
…path (llvm#171508)

llvm#170809 added the child_stdin_fd_ field on SymbolizerProcess to allow
the parent process to hold on to the read in of the child's stdin pipe.
This was to avoid SIGPIPE.

However, the `StartSubprocess` path still closes the stdin fd in the
parent here:

https://github.com/llvm/llvm-project/blob/7f5ed91684c808444ede24eb01ad9af73b5806e5/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp#L525-L535

This could cause a double-close of this fd (problematic in the case of
fd reuse).

This moves the `child_stdin_fd_` field to only be initialized on the
posix_spawn path. This should ensure llvm#170809 only truly affects Darwin.
Instead of getting a lock and then checking/modifying the Initialization
variable, make it an atomic. Doing this, we can remove one of the
mutexes in shared TSDs and avoid any potential lock contention in both
shared TSDs and exclusive TSDs if multiple threads do allocation
operations at the same time.

Add two new tests that make sure no crashes occur if multiple threads
try and do allocations at the same time.
This reverts commit 8a115b6.

This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326

/home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'
)

They cannot be consolidated, as WidenPHI is not a header PHI, while
ActtiveLaneMaskPHI is.
@ronlieb ronlieb requested review from a team and dpalermo December 10, 2025 02:20
@ronlieb ronlieb removed the request for review from nicolasvasilache December 10, 2025 02:20
@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit 704a42a into amd-staging Dec 10, 2025
20 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251209195003 branch December 10, 2025 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.