Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement grouped conv interface #80870

Closed
wants to merge 8,927 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jun 7, 2024

  1. Test commit

    dsandersllvm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    abd0bf1 View commit details
    Browse the repository at this point in the history
  2. Revert "Test commit"

    This reverts commit 2ec122d.
    dsandersllvm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    bf572a5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    28e57cd View commit details
    Browse the repository at this point in the history
  4. [RuntimeDyld][ELF] Fix unwanted sign extension. (llvm#94482)

    Casting the result of `Section.getAddressWithOffset()` goes wrong if we
    are on a 32-bit platform whose addresses are regarded as signed; in that
    case, just doing
    ```
    (uint64_t)Section.getAddressWithOffset(...)
    ```
    or
    ```
    reinterpret_cast<uint64_t>(Section.getAddressWithOffset(...))
    ```
    will result in sign-extension.
    
    We use these expressions when constructing branch stubs, which is before
    we know the final load address, so we can just switch to the
    `Section.getLoadAddressWithOffset(...)` method instead.
    
    Doing that is also more consistent, since when calculating relative
    offsets for relocations, we use the load address anyway, so the code
    currently only works because `Section.Address` is equal to
    `Section.LoadAddress` at this point.
    
    Fixes llvm#94478.
    al45tair authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ea63530 View commit details
    Browse the repository at this point in the history
  5. [LoongArch] Add a hook to sign extend i32 ConstantInt operands of phi…

    …s on LA64 (llvm#93813)
    
    Materializing constants on LoongArch is simpler if the constant is sign
    extended from i32. By default i32 constant operands of phis are zero
    extended.
        
    This patch adds a hook to allow LoongArch to override this for i32. We
    have an existing isSExtCheaperThanZExt, but it operates on EVT which we
    don't have at these places in the code.
    heiher authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9fc29d3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5ddc841 View commit details
    Browse the repository at this point in the history
  7. [libc][math][c23] Implement fmaxf16 and fminf16 function (llvm#94131)

    Implements fmaxf16 and fminf16, which are two missing functions listed
    here: llvm#93566
    HendrikHuebner authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    95e1431 View commit details
    Browse the repository at this point in the history
  8. [lldb] Fix inconsistencies in DWARFExpression errors (llvm#94554)

    This patch make all errors start with a lowercase letter and removes
    trailing periods and newlines. This fixes inconsistencies between error
    messages and facilitate concatenating them.
    JDevlieghere authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9e25be5 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f2165ae View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    b22873d View commit details
    Browse the repository at this point in the history
  11. [lldb/crashlog] Always load Application Specific Backtrace Thread ima…

    …ges (llvm#94259)
    
    This patch changes the crashlog image loading default behaviour to not
    only load images from the crashed thread but also for the application
    specific backtrace thread.
    
    This patch also move the Application Specific Backtrace / Last Exception
    Backtrace tag from the thread queue field to the thread name.
    
    rdar://128276576
    
    Signed-off-by: Med Ismail Bennani <[email protected]>
    medismailben authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    6a4e4f6 View commit details
    Browse the repository at this point in the history
  12. [WebAssembly] Set IS_64 flag correctly on __indirect_function_table i…

    …n object files (llvm#94487)
    
    Follow up to llvm#92042
    sbc100 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    725b792 View commit details
    Browse the repository at this point in the history
  13. [serialization] no transitive decl change (llvm#92083)

    Following of llvm#86912
    
    The motivation of the patch series is that, for a module interface unit
    `X`, when the dependent modules of `X` changes, if the changes is not
    relevant with `X`, we hope the BMI of `X` won't change. For the specific
    patch, we hope if the changes was about irrelevant declaration changes,
    we hope the BMI of `X` won't change. **However**, I found the patch
    itself is not very useful in practice, since the adding or removing
    declarations, will change the state of identifiers and types in most
    cases.
    
    That said, for the most simple example,
    
    ```
    // partA.cppm
    export module m:partA;
    
    // partA.v1.cppm
    export module m:partA;
    export void a() {}
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    the BMI of `onlyUseB` will change after we change the implementation of
    `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
    identifiers and types (the function prototype).
    
    So in this patch, we have to write the tests as:
    
    ```
    // partA.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA2(int) { ... }
    
    // partA.v1.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA(int) { ... }
    export int getA2(int) { ... }
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    so that the new introduced declaration `int getA(int)` doesn't introduce
    new identifiers and types, then the BMI of `onlyUseB` can keep
    unchanged.
    
    While it looks not so great, the patch should be the base of the patch
    to erase the transitive change for identifiers and types since I don't
    know how can we introduce new types and identifiers without introducing
    new declarations. Given how tightly the relationship between
    declarations, types and identifiers, I think we can only reach the ideal
    state after we made the series for all of the three entties.
    
    The design of the patch is similar to
    llvm#86912, which extends the
    32-bit DeclID to 64-bit and use the higher bits to store the module file
    index and the lower bits to store the Local Decl ID.
    
    A slight difference is that we only use 48 bits to store the new DeclID
    since we try to use the higher 16 bits to store the module ID in the
    prefix of Decl class. Previously, we use 32 bits to store the module ID
    and 32 bits to store the DeclID. I don't want to allocate additional
    space so I tried to make the additional space the same as 64 bits. An
    potential interesting thing here is about the relationship between the
    module ID and the module file index. I feel we can get the module file
    index by the module ID. But I didn't prove it or implement it. Since I
    want to make the patch itself as small as possible. We can make it in
    the future if we want.
    
    Another change in the patch is the new concept Decl Index, which means
    the index of the very big array `DeclsLoaded` in ASTReader. Previously,
    the index of a loaded declaration is simply the Decl ID minus
    PREDEFINED_DECL_NUMs. So there are some places they got used
    ambiguously. But this patch tried to split these two concepts.
    
    As llvm#86912 did, the change will
    increase the on-disk PCM file sizes. As the declaration ID may be the
    most IDs in the PCM file, this can have the biggest impact on the size.
    In my experiments, this change will bring 6.6% increase of the on-disk
    PCM size. No compile-time performance regression observed. Given the
    benefits in the motivation example, I think the cost is worthwhile.
    ChuanqiXu9 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    07c0c26 View commit details
    Browse the repository at this point in the history
  14. Revert "[RISCV] Support select/merge like ops for bf16 vectors when h…

    …ave Zvfbfmin" (llvm#94565)
    
    Reverts llvm#91936
    
    Premerge bots are broken.
    joker-eph authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    c719881 View commit details
    Browse the repository at this point in the history
  15. [flang] Add GETCWD runtime and lowering intrinsics implementation (ll…

    …vm#92746)
    
    This patch add support of intrinsics GNU extension GETCWD
    llvm#84203. Some usage info and
    example has been added to `flang/docs/Intrinsics.md`. The patch contains
    both the lowering and the runtime code and works on both Windows and
    Linux.
    
    
    |   System   |   Implmentation  |
    |-----------|--------------------|
    | Windows | _getcwd               |
    | Linux       |getcwd                  |
    JumpMasterJJ authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3052bcc View commit details
    Browse the repository at this point in the history
  16. [clang] Implement a __is_bitwise_cloneable builtin type trait. (llvm#…

    …86512)
    
    This patch implements a `__is_bitwise_cloneable` builtin in clang.
    
    The builtin is used as a guard to check a type can be safely bitwise
    copied by memcpy. It's functionally similar to
    `__is_trivially_copyable`, but covers a wider range of types (e.g.
    classes with virtual functions). The compiler guarantees that after
    copy, the destination object has the same object representations as the
    source object. And it is up to user to guarantee that program semantic
    constraints are satisfied.
    
    Context:
    https://discourse.llvm.org/t/extension-for-creating-objects-via-memcpy
    hokein authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    85fd90b View commit details
    Browse the repository at this point in the history
  17. [LoongArch] Adjust LA64 data layout by using n32:64 in layout string (l…

    …lvm#93814)
    
    Although i32 type is illegal in the backend, LA64 has pretty good
    support for i32 types by using W instructions.
    
    By adding n32 to the DataLayout string, middle end optimizations will
    consider i32 to be a native type. One known effect of this is enabling
    LoopStrengthReduce on loops with i32 induction variables. This can be
    beneficial because C/C++ code often has loops with i32 induction
    variables due to the use of `int` or `unsigned int`.
    
    If this patch exposes performance issues, those are better addressed by
    tuning LSR or other passes.
    heiher authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    16c3e1a View commit details
    Browse the repository at this point in the history
  18. [MLIR][LLVM] Improve module translation comment (NFC) (llvm#94577)

    This commit enhances the docsting of `translateModuleToLLVMIR` as a
    followup to llvm#94445
    Dinistro authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8ffa33f View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    b6c4da3 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    ab331bb View commit details
    Browse the repository at this point in the history
  21. [InstCombine] Only requite not-undef in select equiv fold

    As the comment already indicates, only replacement with undef
    is problematic, as it introduces an additional use of undef.
    Use the correct ValueTracking helper.
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    d836ae8 View commit details
    Browse the repository at this point in the history
  22. [ValueTracking] Make undef element check more precise

    If we're only checking for undef, then also only look for undef
    elements in the vector (rather than undef and poison).
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    942e935 View commit details
    Browse the repository at this point in the history
  23. [LoopUnroll] Consider convergence control tokens when unrolling (llvm…

    …#91715)
    
    - There is no restriction on a loop with controlled convergent
    operations when
      the relevant tokens are defined and used within the loop.
    
    - When a token defined outside a loop is used inside (also called a loop
    convergence heart), unrolling is allowed only in the absence of
    remainder or
      runtime checks.
    
    - When a token defined inside a loop is used outside, such a loop is
    said to be
    "extended". This loop can only be unrolled by also duplicating the
    extended part
      lying outside the loop. Such unrolling is disabled for now.
    
    - Clean up loop hearts: When unrolling a loop with a heart, duplicating
    the
    heart will introduce multiple static uses of a convergence control token
    in a
    cycle that does not contain its definition. This violates the static
    rules for
    tokens, and needs to be cleaned up into a single occurrence of the
    intrinsic.
    
    - Spell out the initializer for UnrollLoopOptions to improve
    readability.
    
    
    Original implementation [D85605] by Nicolai Haehnle
    <[email protected]>.
    ssahasra authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    fa7e78c View commit details
    Browse the repository at this point in the history
  24. [SDPatternMatch] Do not use std::forward and rvalue references (NFC) (l…

    …lvm#93806)
    
    The m_ZExtOrSelf() family of matchers currently incorrectly calls
    std::forward twice on the same value. However, just removing those causes
    other complications, because then template arguments get incorrectly
    inferred to const references instead of the underlying value types.
    Things become a mess.
    
    Instead, just completely remove the use of std::forward and rvalue
    references from SDPatternMatch. I don't think they really provide value
    in this context, especially as they're not used consistently in the
    first place.
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    11675cb View commit details
    Browse the repository at this point in the history
  25. [InstCombine] Add transforms (icmp spred (and X, Y), X) if X or `…

    …Y` are known signed/unsigned
    
    Several transforms:
        1) If known `Y < 0`:
            - slt -> ult: https://alive2.llvm.org/ce/z/9zt2iK
            - sle -> ule: https://alive2.llvm.org/ce/z/SPoPNF
            - sgt -> ugt: https://alive2.llvm.org/ce/z/IGNxAk
            - sge -> uge: https://alive2.llvm.org/ce/z/joqTvR
        2) If known `Y >= 0`:
            - `(X & PosY) s> X --> X s< 0`
                - https://alive2.llvm.org/ce/z/7e-5BQ
            - `(X & PosY) s> X --> X s< 0`
                - https://alive2.llvm.org/ce/z/jvT4Gb
        3) If known `X < 0`:
            - `(NegX & Y) s> NegX --> Y s>= 0`
                - https://alive2.llvm.org/ce/z/ApkaEh
            - `(NegX & Y) s<= NegX --> Y s< 0`
                - https://alive2.llvm.org/ce/z/oRnfHp
    
    Closes llvm#94417
    goldsteinn authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ae8398c View commit details
    Browse the repository at this point in the history
  26. [ARM] vabd.ll - regenerate test checks

    Cleanup for llvm#94504
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    2b0061c View commit details
    Browse the repository at this point in the history
  27. [ARM] vaba.ll - regenerate test checks

    Cleanup for llvm#94504
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    43a52d5 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    39027b5 View commit details
    Browse the repository at this point in the history
  29. [DebugInfo][SelectionDAG] Fix position of salvaged 'dangling' DBG_VAL…

    …UEs (llvm#94458)
    
    `SelectionDAGBuilder::handleDebugValue` has a parameter `Order` which
    represents the insert-at position for the new DBG_VALUE. Prior to this patch
    `SelectionDAGBuilder::SDNodeOrder` is used instead of the `Order` parameter.
    
    The only code-paths where `Order != SDNodeOrder` are the two calls calls to
    `handleDebugValue` from `salvageUnresolvedDbgValue`.
    `salvageUnresolvedDbgValue` is called from `resolveOrClearDbgInfo` and
    `dropDanglingDebugInfo`. The former is called after SelectionDAG completes one
    block.
    
    Some dbg.values can't be lowered to DBG_VALUEs right away. These get recorded
    as 'dangling' - their order-number is saved - and get salvaged later through
    `dropDanglingDebugInfo`, or if we've still got dangling debug info once the
    whole block has been emitted, through `resolveOrClearDbgInfo`. Their saved
    order-number is passed to `handleDebugValue`.
    
    Prior to this patch, DBG_VALUEs inserted using these functions are inserted at
    the "current" `SDNodeOrder` rather than the intended position that is passed to
    the function.
    
    Fix and add test.
    OCHyams authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5ed1246 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    94cc5f4 View commit details
    Browse the repository at this point in the history
  31. [llvm-reduce] Remove DIGlobalVariableExpressions from DICompileUnit's…

    … globals (llvm#94497)
    
    The 'metadata' delta pass will remove !dbg attachments from globals (which are
    DIGlobalVariableExpression nodes). The DIGlobalVariableExpressions don't get
    eliminated from the IR however if they are still referenced by the globals
    field in DICompileUnit.
    
    Teach the 'di-metadata' pass to try removing global variable operands from
    metadata tuples as well as DINodes.
    OCHyams authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    15b6e55 View commit details
    Browse the repository at this point in the history
  32. [flang][OpenMP] Fix privatization when critical is present (llvm#94441)

    When a critical construct is present inside another construct where
    privatizations may occur, such as a parallel construct, some
    privatizations are skipped if the corresponding symbols are defined
    inside the critical section only (see the example below).
    
    This happens because, while critical constructs have a "body", they
    don't have a separate scope (which makes sense, since no
    privatizations can occur in them). Because of this, in semantics
    phase, it's not possible to insert a new host association symbol,
    but instead the symbol from the enclosing context is used directly.
    
    This makes symbol collection in DataSharingProcessor consider the
    new symbol to be defined by the critical construct, instead of by
    the enclosing one, which causes the privatization to be skipped.
    
    Example:
    ```
    !$omp parallel default(firstprivate)
      !$omp critical
         i = 200
      !$omp end critical
    !$omp end parallel
    ```
    
    This patch fixes this by identifying constructs where
    privatizations may not happen and skipping them during the
    collection of nested symbols. Currently, this seems to happen only
    with critical constructs, but others can be easily added to the
    skip list, if needed.
    
    Fixes llvm#75767
    luporl authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0f88be8 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    16f7316 View commit details
    Browse the repository at this point in the history
  34. [PowerPC] Add test to show alignment of toc-data symbol is changed. NFC.

    After O3 opt pipeline, the alignment of toc-data symbol is changed which is
    unexpected.
    Kai Luo authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    376f0d5 View commit details
    Browse the repository at this point in the history
  35. [lldb] Disable TestPtyServer API test when remote testing (llvm#94587)

    The local PTY is not available for the remotely executed lldb-server to
    pass the test. Also, in general, we cannot execute the local lldb-server
    instance because it could be compiled for the different system/cpu
    target.
    slydiman authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    65a7389 View commit details
    Browse the repository at this point in the history
  36. [ARM] Add neon_vabd.ll based off aarch64 tests

    Test coverage for llvm#94504
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5791862 View commit details
    Browse the repository at this point in the history
  37. [DAG] visitSUB - update the ABS matching code to use SDPatternMatch a…

    …nd hasOperation.
    
    Avoids the need to explicitly test both commuted variants and doesn't match custom lowering after legalization.
    
    Cleanup for llvm#94504
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    80694e4 View commit details
    Browse the repository at this point in the history
  38. [flang][CodeGen][NFC] Reduce boilerplatre for ExternalNameConversion (l…

    …lvm#94474)
    
    Use tablegen to generate the pass constructor.
    
    I removed the duplicated pass option handling. I don't understand why
    the manual instantiation of the pass needs its own duplicate of the pass
    options in the (automatically generated) base class (even with the
    option to ignore the pass options in the base class).
    
    This pass doesn't need changes to support other top level operations.
    tblah authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    40333fc View commit details
    Browse the repository at this point in the history
  39. [PowerPC] Adjust operand order of ADDItoc to be consistent with other…

    … ADDI* nodes (llvm#93642)
    
    Simultaneously, the `ADDItoc` machineinstr is generated in
    `PPCISelDAGToDAG::Select` so the pattern is not used and can be removed.
    Kai Luo authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    c3bc314 View commit details
    Browse the repository at this point in the history
  40. [clang][Interp] Member Pointers (llvm#91303)

    This adds a `MemberPointer` class along with a `PT_MemberPtr` primitive
    type.
    
    A `MemberPointer` has a `Pointer` Base as well as a `Decl*` (could be
    `ValueDecl*`?) decl it points to.
    For the actual logic, this mainly changes the way we handle `PtrMemOp`s
    in `VisitBinaryOperator`.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    75a1c58 View commit details
    Browse the repository at this point in the history
  41. [gn build] Port a86c1e7

    llvmgnsyncbot authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    453205a View commit details
    Browse the repository at this point in the history
  42. [AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQu…

    …adMode (llvm#94452)
    
    NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT
    instructions a little longer so that we can reliably identify them in
    SIWholeQuadMode.
    jayfoad authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    1cb3b5c View commit details
    Browse the repository at this point in the history
  43. [AMDGPU] Implement variadic functions by IR lowering (llvm#93362)

    This is a mostly-target-independent variadic function optimisation and
    lowering pass. It is only enabled for AMDGPU in this initial commit.
    
    The purpose is to make C style variadic functions a zero cost
    abstraction. They are lowered to equivalent IR which is then amenable to
    other optimisations. This is inherently slightly target specific but
    much less so than one might expect - the C varargs interface heavily
    constrains the ABI design divergence.
    
    The pass is primarily tested from webassembly. This is because wasm has
    a straightforward variadic lowering strategy which coincides exactly
    with what this pass transforms code into and a struct passing convention
    with few cases to check. Adding further targets conventions is
    straightforward and elided from this patch primarily to simplify the
    review. Implemented in other branches are Linux X86, AMD64, AArch64 and
    NVPTX.
    
    Testing for targets that have existing lowering for va_arg from clang is
    most efficiently done by checking that clang | opt completely elides the
    variadic syntax from test cases. The lowering produces a struct for each
    call site which can be inspected to check the various alignment and
    indirections are correct.
    
    AMDGPU presently has no variadic support other than some ad hoc printf
    handling. Combined with the pass being inactive on all other targets
    landing this represents strict increase in capability with zero risk.
    Testing and refining will continue post commit.
    
    In addition to the compiler tests included here, a self contained x64
    clang/musl toolchain was constructed using the "lowering" instead of the
    systemv ABI and used to build various C programs like lua and libxml2.
    JonChesterfield authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    73af086 View commit details
    Browse the repository at this point in the history
  44. Revert "[Analyzer][CFG] Correctly handle rebuilt default arg and defa…

    …ult init expression (llvm#91879)" (llvm#94597)
    
    This depends on llvm#92527 which
    needs to be reverted due to
    llvm#92527 (comment).
    
    This reverts commit 905b402.
    
    Co-authored-by: Bogdan Graur <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    85c4dd6 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    6ec798c View commit details
    Browse the repository at this point in the history
  46. Revert "[serialization] no transitive decl change (llvm#92083)"

    This reverts commit 97c866f.
    
    This fails on 32bit machines. See
    llvm#92083
    ChuanqiXu9 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8d48e5c View commit details
    Browse the repository at this point in the history
  47. Revert "Reapply "[Clang][CWG1815] Support lifetime extension of tempo…

    …rary created by aggregate initialization using a default member initializer" (llvm#92527)" (llvm#94600)
    
    Reverting due to
    llvm#92527 (comment).
    
    This reverts commit f049d72.
    
    Co-authored-by: Bogdan Graur <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    13eb6de View commit details
    Browse the repository at this point in the history
  48. [AArch64][SME] Add calling convention for __arm_get_current_vg (llvm#…

    …93963)
    
    Adds a calling convention for calls to the `__arm_get_current_vg`
    support
    routine, which preserves X1-X15, X19-X29, SP, Z0-Z31 & P0-P15.
    
    See ARM-software/abi-aa#263
    kmclaughlin-arm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8a3e4b1 View commit details
    Browse the repository at this point in the history
  49. [gn build] Port 8516f54

    llvmgnsyncbot authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    2640ff6 View commit details
    Browse the repository at this point in the history
  50. [GlobalIsel] Combine G_VSCALE (llvm#94096)

    We need them for scalable address calculation and
    legal scalable addressing modes.
    tschuett authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    6299b18 View commit details
    Browse the repository at this point in the history
  51. [bazel] Port for 8516f54

    Remove some #includes in ExpandVariadics.cpp as it will cause layering
    violations.
    hokein authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    bfc793a View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    1b84df2 View commit details
    Browse the repository at this point in the history
  53. [Transforms] Fix -Wunused-variable in ExpandVariadics.cpp (NFC)

    /llvm-project/llvm/lib/Transforms/IPO/ExpandVariadics.cpp:426:14:
    error: unused variable 'OriginalFunctionIsDeclaration' [-Werror,-Wunused-variable]
      const bool OriginalFunctionIsDeclaration = OriginalFunction->isDeclaration();
                 ^
    /llvm-project/llvm/lib/Transforms/IPO/ExpandVariadics.cpp:445:13:
    error: unused variable 'VariadicWrapperDefine' [-Werror,-Wunused-variable]
      Function *VariadicWrapperDefine =
                ^
    2 errors generated.
    DamonFool authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    aa10468 View commit details
    Browse the repository at this point in the history
  54. [ARM] Don't block tail-predication from unrelated VPT blocks. (llvm#9…

    …4239)
    
    VPT blocks that do not produce an interesting 'output' (like a stored
    value or reduction result), do not need to be predicated on vctp for the
    whole loop to be tail-predicated. Just producing results for the valid
    tail predication lanes should be enough.
    davemgreen authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    cec4763 View commit details
    Browse the repository at this point in the history
  55. [clang-tidy] Remove redundant LINK_LIBS (llvm#94588)

    clangAnalysis is already being pulled in via
    clang_target_link_libraries(). Also listing it in LINK_LIBS means that
    we'll link both against the static libraries and the shared
    libclang-cpp.so library if CLANG_LINK_CLANG_DYLIB is enabled, and waste
    time on unnecessary LTO.
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0b8566b View commit details
    Browse the repository at this point in the history
  56. [libc][math] Temporarily disable nexttowardf16 on aarch64 due to clan…

    …g-11 bug. (llvm#94569)
    
    The conversion between _Float16 and long double will crash clang-11 on
    aarch64. This is fixed in clang-12: https://godbolt.org/z/8ceT9454c
    lntue authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7b0f3ad View commit details
    Browse the repository at this point in the history
  57. [DAG] expandABS - add missing FREEZE in abs(x) -> smax(x,sub(0,x)) ex…

    …pansion
    
    Noticed while working on llvm#94601
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    cdc9c01 View commit details
    Browse the repository at this point in the history
  58. [flang][OpenMP] Make object identity more precise (llvm#94495)

    Derived type components may use a given `Symbol` regardless of what
    parent objects they are a part of. Because of that, simply using a
    symbol address is not sufficient to determine object identity.
    
    Make the designator a part of the IdTy. To compare identities, when
    symbols are equal (and non-null), compare the designators.
    kparzysz authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5fd7a9b View commit details
    Browse the repository at this point in the history
  59. [clang][Sema] Add missing scope flags to Scope::dumpImpl (llvm#94529)

    There were a handlful of scope flags that were not handled in the dump
    function, which would then lead to an assert.
    kparzysz authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ffebcd0 View commit details
    Browse the repository at this point in the history
  60. [ConstraintElim] Add set of tests where a loop iv is used in exit.

    Test cases inspired by
    llvm#90417.
    fhahn authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7a4c101 View commit details
    Browse the repository at this point in the history
  61. [LoongArch] Allow f16 codegen with expansion to libcalls (llvm#94456)

    The test case is adapted from llvm/test/CodeGen/RISCV/fp16-promote.ll,
    because it covers some more IR patterns that ought to be common.
    
    Fixes llvm#93894
    xen0n authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7a2a155 View commit details
    Browse the repository at this point in the history
  62. [workflows] Add scan-build to ci-ubuntu-22.04 container (llvm#94543)

    This will be used for a new CI job that runs the static analyzer.
    tstellar authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8aebc05 View commit details
    Browse the repository at this point in the history
  63. [X86][AMX] Checking AMXProgModel in X86LowerTileCopy (llvm#94358)

    This fixes compile time regression after llvm#93692.
    phoebewang authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e60c668 View commit details
    Browse the repository at this point in the history
  64. [Libomptarget] Rework device initialization and image registration (l…

    …lvm#93844)
    
    Summary:
    Currently, we register images into a linear table according to the
    logical OpenMP device identifier. We then initialize all of these images
    as one block. This logic requires that images are compatible with *all*
    devices instead of just the one that it can run on. This prevents us
    from running on systems with heterogeneous devices (i.e. image 1 runs on
    device 0 image 0 runs on device 1).
    
    This patch reworks the logic by instead making the compatibility check a
    per-device query. We then scan every device to see if it's compatible
    and do it as they come.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    c33b869 View commit details
    Browse the repository at this point in the history
  65. [X86] Fix pipe resources for HADD/SUB instructions

    IceLakeServer was copying these from SkylakeServer, but integer HADD/SUB can now run on an extra port
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ba9c7b4 View commit details
    Browse the repository at this point in the history
  66. [X86] Fix pipe resources for FP HADD/SUB instructions

    IceLakeServer/SkylakeServer can only use Port01 for the FADD/FSUB stage
    
    Confirmed with uops.info + Agner
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    1dce655 View commit details
    Browse the repository at this point in the history
  67. [Clang][AMDGPU] Use I to decorate imm argument for `__builtin_amdgc…

    …n_global_load_lds` (llvm#94376)
    shiltian authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    093818b View commit details
    Browse the repository at this point in the history
  68. [libc] Enable varargs tests for AMDGPU targets

    Summary:
    This reverts commit 574ab7e.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    4782ad7 View commit details
    Browse the repository at this point in the history
  69. [NVPTX] Revamp NVVMIntrRange pass (llvm#94422)

    Revamp the NVVMIntrRange pass making the following updates:
    - Use range attributes over range metadata. This is what instcombine has
    move to for ranges on intrinsics in
    llvm#88776 and it seems a bit
    cleaner.
    - Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function
    metadata when adding ranges for `tid` srge instrinsics. This can allow
    for smaller ranges and more optimization.
    - When range attributes are already present, use the intersection of the
    old and new range. This complements the metadata change by allowing
    ranges to be shrunk when an intrinsic is in a function which is inlined
    into a kernel with metadata. While we don't call this more then once
    yet, we should consider adding a second call after inlining, once this
    has had a chance to soak for a while and no issues have arisen.
    
    I've also re-enabled this pass in the TM, it was disabled years ago due
    to "numerical discrepancies" https://reviews.llvm.org/D96166. In our
    testing we haven't seen any issues with adding ranges to intrinsics, and
    I cannot find any further info about what issues were encountered.
    AlexMaclean authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    2be0989 View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    8fe8ede View commit details
    Browse the repository at this point in the history
  71. [AArch64] Override isLSRCostLess, take number of instructions into ac…

    …count (llvm#84189)
    
    Adds an AArch64-specific version of isLSRCostLess, changing the relative
    importance of the various terms from the formulae being evaluated.
    
    This has been split out from my vscale-aware LSR work, see the RFC for
    reference:
    https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131
    huntergr-arm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    48c9a27 View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    b4896c9 View commit details
    Browse the repository at this point in the history
  73. [NVPTX] Remove unused private field in NVVMIntrRange.cpp (NFC)

    /llvm-project/llvm/lib/Target/NVPTX/NVVMIntrRange.cpp:33:12:
    error: private field 'SmVersion' is not used [-Werror,-Wunused-private-field]
      unsigned SmVersion;
               ^
    1 error generated.
    DamonFool authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    2450e72 View commit details
    Browse the repository at this point in the history
  74. [lldb] Fix ThreadPlanStepOverRange name in log message (llvm#94611)

    Co-authored-by: Marianne Mailhot-Sarrasin <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    52edac7 View commit details
    Browse the repository at this point in the history
  75. [OpenMP][NFC] Fix warning for OpenMP standalone build (llvm#93463)

    PR llvm#75125 introduced upward propagation of some OMPT-related CMake
    variables.
    For stand-alone builds this results in a warning that `SCOPE_PARENT` has
    no meaning in a top-level directory.
    jprotze authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a6444dd View commit details
    Browse the repository at this point in the history
  76. [RISCV] Fix duplicate test cases for G_UNMERGE_VALUES (llvm#94622)

    `unmerge_i64` and `unmerge_i32` were exactly the same test cases. This
    PR would fix that, so `unmerge_i32` would actually unmerge a 32 bit
    value into two 16 bit values.
    spaits authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5ddd8e7 View commit details
    Browse the repository at this point in the history
  77. [mlir][tensor] Implement constant folder for tensor.pad (llvm#92691)

    Extend the folding ability of the RewriteAsConstant patterns to include
    tensor.pad operations on constants. The new pattern with constant fold
    tensor.pad operations which operate on tensor constants and have
    statically resolvable padding sizes/values.
    
        %init = arith.constant dense<[[6, 7], [8, 9]]> : tensor<2x2xi32>
        %pad_value = arith.constant 0 : i32
    
        %0 = tensor.pad %init low[1, 1] high[1, 1] {
          ^bb0(%arg1: index, %arg2: index):
            tensor.yield %pad_value : i32
        } : tensor<2x2xi32> to tensor<4x4xi32>
    
    becomes
    
        %cst = arith.constant dense<[[0, 0, 0, 0],
                                     [0, 6, 7, 0],
                                     [0, 8, 9, 0],
                                     [0, 0, 0, 0]]> : tensor<4x4xi32>
    
    Co-authored-by: Spenser Bauman <sabauma@fastmail>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0b27a2e View commit details
    Browse the repository at this point in the history
  78. [BOLT][DWARF][NFC] Refactor GDB Index into a new file (llvm#94405)

    Create a new class and file for functions that update GDB index.
    sayhaan authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    27a3150 View commit details
    Browse the repository at this point in the history
  79. [AArch64] Add support for Qualcomm Oryon processor (llvm#91022)

    Oryon is an ARM V8 AArch64 CPU from Qualcomm.
    
    ---------
    
    Co-authored-by: Wei Zhao <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    158494e View commit details
    Browse the repository at this point in the history
  80. [SPIR-V] Add validation to the test case with get_image_array_size/ge…

    …t_image_dim calls (llvm#94467)
    
    This PR is to add validation to the test case with
    get_image_array_size/get_image_dim calls
    (transcoding/check_ro_qualifier.ll). This test case didn't pass
    validation because of invalid emission of OpCompositeExtract instruction
    (Result Type must be the same type as Composite.).
    
    In order to fix the problem this PR improves type inference in general
    and partially addresses issues:
    * llvm#91998
    * llvm#91997
    
    A reproducer from the description of the latter issue is added as a new
    test case as a part of this PR.
    VyacheslavLevytskyy authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    79653ce View commit details
    Browse the repository at this point in the history
  81. Configuration menu
    Copy the full SHA
    25f2565 View commit details
    Browse the repository at this point in the history
  82. [clang][Interp][NFC] Return a valid SourceInfo for Function PCs

    We already assert that the given PC is in range and that the
    function has a body, so the SrcMap should generally never be empty.
    However, when generating destructors, we create quite a few instructions
    for which we have no source information, which may cause the previous
    assertion to fail. Return the end of the source map in this case.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0e92e4f View commit details
    Browse the repository at this point in the history
  83. DAG: Pass flags to FoldConstantArithmetic (llvm#93663)

    There is simply way too much going on inside getNode. The complicated
    constant folding of vector handling works by looking for build_vector
    operands, and then tries to getNode the scalar element and then checks
    if
    constants were the result. As a side effect, this produces unused scalar
    operation nodes (previously, without flags). If the vector operation
    were later scalarized, it would find the flagless constant folding
    temporary and lose the flag. I don't think this is a reasonable way for
    constant folding to operate, but for now fix this by ensuring flags
    on the original operation are preserved in the temporary.
        
    This yields a clear code improvement for AMDGPU when f16 isn't legal.
    The Wasm cases switch from using a libcall to compare and select. We are
    evidently
    missing the fcmp+select to fminimum/fmaximum handling, but this would be
    further
    improved when that's handled. AArch64 also avoids the libcall, but looks
    worse and
    has a different call for some reason.
    arsenm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5332122 View commit details
    Browse the repository at this point in the history
  84. Configuration menu
    Copy the full SHA
    9881528 View commit details
    Browse the repository at this point in the history
  85. [MLIR] Fix generic assembly syntax for ArrayAttr containing hex float (

    …llvm#94583)
    
    When a float attribute is printed with Hex, we should not elide the type
    because it is parsed back as i64 otherwise.
    joker-eph authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    4691b20 View commit details
    Browse the repository at this point in the history
  86. [mlir] Add pack/unpack transpose foldings for linalg.generic ops, fix…

    … bugs (llvm#93055)
    
    This PR adds transpose + pack/unpack folding support for transpose ops
    in the form of `linalg.generic` ops. There were also some bugs with the
    permutation composing in the previous patterns, so this PR fixes these
    bugs and adds tests for them as well.
    Max191 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b70d25d View commit details
    Browse the repository at this point in the history
  87. [gn build] Port 2a6efe6

    llvmgnsyncbot authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a9cf149 View commit details
    Browse the repository at this point in the history
  88. [Bazel] Generate LLVM_HAS_XYZ_TARGET macros in llvm config (llvm#94476)

    Otherwise code that depends on those targets being enabled might not get
    compiled correctly even if the targets are explicitly included in the
    configuration (in my case NVVM target for MLIR).
    apaszke authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    659318e View commit details
    Browse the repository at this point in the history
  89. [CodeGen] Use std::bitset for MachineFunctionProperties (llvm#94627)

    The size of the properties is fixed, so no need for a BitVector.
    Assigning small, fixed-size bitsets is faster.
    
    It's a minor performance improvement.
    aengelke authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a8ce798 View commit details
    Browse the repository at this point in the history
  90. [X86] Skip AMX type lowering when AMX is not used (llvm#92910)

    The pass iterates over the IR multiple times, but most code doesn't use
    AMX. Therefore, do a single iteration in advance to check whether a
    function uses AMX at all, and exit early if it doesn't. This makes the
    function-has-AMX path slightly more expensive, but AMX users probably
    care a lot less about compile time than JIT users (which tend to not use
    AMX).
    
    For us, it reduces the time spent in this pass from 0.62% to 0.12%.
    
    Ideally, we wouldn't even need to iterate over the function to determine
    that it doesn't use AMX.
    aengelke authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    696a2f5 View commit details
    Browse the repository at this point in the history
  91. RegisterCoalescer: Remove unnecessary maybe_unused

    2214026 didn't fix an unused variable
    warning correctly.
    arsenm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ac3f92a View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    3c1d2a0 View commit details
    Browse the repository at this point in the history
  93. Configuration menu
    Copy the full SHA
    19dd633 View commit details
    Browse the repository at this point in the history
  94. [llvm][ScheduleDAG] Set a fixed size for Sched::Preference (llvm#94523)

    This trims off 8 bytes from llvm::SUnit:
    ```
    --- before	2024-06-05 12:13:00
    +++ after	2024-06-05 12:12:58
    @@ -1,65 +1,65 @@
     *** Dumping AST Record Layout
              0 | class llvm::SUnit
              0 |   SDNode * Node
              8 |   MachineInstr * Instr
             16 |   SUnit * OrigNode
             24 |   const MCSchedClassDesc * SchedClass
             32 |   class llvm::SmallVector<class llvm::SDep, 4> Preds
             32 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
             32 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
             32 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
             32 |           class llvm::SmallVectorBase<uint32_t> (base)
             32 |             void * BeginX
             40 |             unsigned int Size
             44 |             unsigned int Capacity
             48 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
             48 |       char[64] InlineElts
            112 |   class llvm::SmallVector<class llvm::SDep, 4> Succs
            112 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
            112 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
            112 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
            112 |           class llvm::SmallVectorBase<uint32_t> (base)
            112 |             void * BeginX
            120 |             unsigned int Size
            124 |             unsigned int Capacity
            128 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
            128 |       char[64] InlineElts
            192 |   unsigned int NodeNum
            196 |   unsigned int NodeQueueId
            200 |   unsigned int NumPreds
            204 |   unsigned int NumSuccs
            208 |   unsigned int NumPredsLeft
            212 |   unsigned int NumSuccsLeft
            216 |   unsigned int WeakPredsLeft
            220 |   unsigned int WeakSuccsLeft
            224 |   unsigned short NumRegDefsLeft
            226 |   unsigned short Latency
        228:0-0 |   _Bool isVRegCycle
        228:1-1 |   _Bool isCall
        228:2-2 |   _Bool isCallOp
        228:3-3 |   _Bool isTwoAddress
        228:4-4 |   _Bool isCommutable
        228:5-5 |   _Bool hasPhysRegUses
        228:6-6 |   _Bool hasPhysRegDefs
        228:7-7 |   _Bool hasPhysRegClobbers
        229:0-0 |   _Bool isPending
        229:1-1 |   _Bool isAvailable
        229:2-2 |   _Bool isScheduled
        229:3-3 |   _Bool isScheduleHigh
        229:4-4 |   _Bool isScheduleLow
        229:5-5 |   _Bool isCloned
        229:6-6 |   _Bool isUnbuffered
        229:7-7 |   _Bool hasReservedResource
    -       232 |   Sched::Preference SchedulingPref
    -   236:0-0 |   _Bool isDepthCurrent
    -   236:1-1 |   _Bool isHeightCurrent
    -       240 |   unsigned int Depth
    -       244 |   unsigned int Height
    -       248 |   unsigned int TopReadyCycle
    -       252 |   unsigned int BotReadyCycle
    -       256 |   const TargetRegisterClass * CopyDstRC
    -       264 |   const TargetRegisterClass * CopySrcRC
    -           | [sizeof=272, dsize=272, align=8,
    -           |  nvsize=272, nvalign=8]
    +       230 |   Sched::Preference SchedulingPref
    +   231:0-0 |   _Bool isDepthCurrent
    +   231:1-1 |   _Bool isHeightCurrent
    +       232 |   unsigned int Depth
    +       236 |   unsigned int Height
    +       240 |   unsigned int TopReadyCycle
    +       244 |   unsigned int BotReadyCycle
    +       248 |   const TargetRegisterClass * CopyDstRC
    +       256 |   const TargetRegisterClass * CopySrcRC
    +           | [sizeof=264, dsize=264, align=8,
    +           |  nvsize=264, nvalign=8]
    ```
    jroelofs authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7650125 View commit details
    Browse the repository at this point in the history
  95. [NFC][libc++][test][AIX] fix SIMD test XFAIL for clang before 19 (llv…

    …m#94509)
    
    058e445 added an XFAIL for this test on AIX because of a backend
    limitation. That backend limitation
    has been resolved by 0295c2a and will be available for clang 19, so we
    should update the test to
    limit the XFAIL to clang versions before that.
    daltenty authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b50875b View commit details
    Browse the repository at this point in the history
  96. [clang] Fix handling of adding a file with the same name as an existi…

    …ng dir to VFS (llvm#94461)
    
    When trying to add a file to clang's VFS via `addFile` and a directory
    of the same name already exists, we run into a [out-of-bound
    access](https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/Path.cpp#L244).
    
    The problem is that the file name is [recognised as existing path](
    https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/VirtualFileSystem.cpp#L896)
    and thus continues to process the next part of the path which doesn't
    exist.
    
    This patch adds a check if we have reached the last part of the filename
    and return false in that case.
    This we reject to add a file if a directory of the same name already
    exists.
    
    This is in sync with [this
    check](https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/VirtualFileSystem.cpp#L903)
    that rejects adding a path if a file of the same name already exists.
    jensmassberg authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a05f49a View commit details
    Browse the repository at this point in the history
  97. [clang][Interp] Always decay root array pointers to the first element

    This is similar to what the current interpreter does.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    99fcd1b View commit details
    Browse the repository at this point in the history
  98. [AIX] use LIBPATH on AIX instead of LD_LIBRARY_PATH (llvm#94602)

    LD_LIBRARY_PATH will become invalid when LIBPATH is also set on AIX.
    
    See below example on AIX:
    ```
    $ldd a.out
    a.out needs:
    	 /usr/lib/libc.a(shr.o)
    Cannot find libtest.a
    	 /unix
    	 /usr/lib/libcrypt.a(shr.o)
    
    $./a.out
    Could not load program ./a.out:
    	Dependent module libtest.a could not be loaded.
    Could not load module libtest.a.
    System error: No such file or directory
    
    $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/tmp
    $./a.out ; echo $?
    10
    
    $export LIBPATH=./
    $./a.out ; echo $?  >>>>>> Now LD_LIBRARY_PATH is not used by system loader
    Could not load program ./a.out:
    	Dependent module libtest.a could not be loaded.
    Could not load module libtest.a.
    System error: No such file or directory
    ```
    
    This breaks many AIX LIT cases on our downstream buildbots which sets
    LIBPATH.
    
    ---------
    
    Co-authored-by: Anh Tuyen Tran <[email protected]>
    Co-authored-by: David Tenty <[email protected]>
    3 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8af229c View commit details
    Browse the repository at this point in the history
  99. [AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (

    llvm#94188)
    
    This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept
    array of function attributes as argument.
    Helps to remove multiple attributes in one CallGraph walk.
    skc7 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b874ae3 View commit details
    Browse the repository at this point in the history
  100. Configuration menu
    Copy the full SHA
    0031c27 View commit details
    Browse the repository at this point in the history
  101. Configuration menu
    Copy the full SHA
    275a662 View commit details
    Browse the repository at this point in the history
  102. Revert "[lldb][DebugNames] Only skip processing of DW_AT_declarations…

    … for class/union types"
    
    and two follow-up commits. The reason is the crash we've discovered when
    processing -gsimple-template-names binaries. I'm committing a minimal
    reproducer as a separate patch.
    
    This reverts the following commits:
    - 51dd4ea (llvm#92328)
    - 3d9d485 (llvm#93839)
    - afe6ab7 (llvm#94400)
    labath authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    879a468 View commit details
    Browse the repository at this point in the history
  103. Configuration menu
    Copy the full SHA
    3a2fbf4 View commit details
    Browse the repository at this point in the history
  104. Configuration menu
    Copy the full SHA
    00a43ed View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    815412f View commit details
    Browse the repository at this point in the history
  106. DAG: Improve fminimum/fmaximum vector expansion logic (llvm#93579)

    First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client
    wanted it expanded, and it can always be expanded. This logic was tied
    up with what the VectorLegalizer wanted.
        
    Prefer using the min/max opcodes, and unrolling if we don't have a
    vselect.
    This seems to produce better code in all the changed tests.
    arsenm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    13f5b56 View commit details
    Browse the repository at this point in the history
  107. [clang-tidy] Fix crash in readability-container-size-empty (llvm#94527)

    Fixed crash caused by call to getCookedLiteral on
    template user defined literal. Fix base on assert in getCookedLiteral
    method.
    
    Closes llvm#94454
    PiotrZSL authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ce63390 View commit details
    Browse the repository at this point in the history
  108. Configuration menu
    Copy the full SHA
    30693e5 View commit details
    Browse the repository at this point in the history
  109. [libc] at_quick_exit function implemented (llvm#94317)

    - added at_quick_exit function 
    - used helper file exit_handler which reuses code from atexit
    - atexit now calls helper functions from exit_handler
    - test cases and dependencies are added
    
    ---------
    
    Co-authored-by: Aaryan Shukla <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8fb6945 View commit details
    Browse the repository at this point in the history
  110. [AMDGPU] Fix GFX1152 ELF arch

    shiltian authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ee48f68 View commit details
    Browse the repository at this point in the history
  111. Configuration menu
    Copy the full SHA
    c3bfbfa View commit details
    Browse the repository at this point in the history
  112. [gtest] Enable zos for death test support (llvm#94623)

    This patch implements the following change to enable zos for death test
    support. google/googletest#4527
    abhina-sree authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7842384 View commit details
    Browse the repository at this point in the history
  113. [clang][Interp] Diagnose functions without body like undefined ones

    We only get a "reached end of constexpr function" diagnostic
    otherwise.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    fd8684c View commit details
    Browse the repository at this point in the history
  114. Configuration menu
    Copy the full SHA
    e070bda View commit details
    Browse the repository at this point in the history
  115. [InstCombine] Folding multiuse (icmp eq/ne (or X, Y), Y) for 2 uses…

    … of `Y`
    
    The fold will replace 2 uses of `Y` we should also do fold if `Y` has
    2 uses (not only oneuse).
    
    Reviewed By: nikic
    
    Differential Revision: https://reviews.llvm.org/D159062
    goldsteinn authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f655b95 View commit details
    Browse the repository at this point in the history
  116. [clang][RISCV] Update vcpop.v C interface to follow the nameing conve…

    …ntion (llvm#94318)
    
    We named the intrinsics by replacing "." by "_" in the instruction
    conventionally, so the `vcpopv_v` where the corresponding instruction is
    `vcpop.v` should be named `vcpop_v`.
    4vtomat authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ae24158 View commit details
    Browse the repository at this point in the history
  117. [MemProf] Remove context id set from nodes and recompute on demand (l…

    …lvm#94415)
    
    The ContextIds set on the ContextNode struct is not technically needed
    as we can compute it from either the callee or caller edge context ids.
    Remove it and add a helper to recompute from the edges on demand. Also
    add helpers to compute the node allocation type and whether the context
    ids are empty from the edges without needing to first compute the node's
    context id set, to minimize the runtime cost increase.
    
    This yielded a 20% reduction in peak memory for a large thin link, for
    about a 2% time increase (which is more than offset by some other recent
    time efficiency improvements).
    teresajohnson authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5a832da View commit details
    Browse the repository at this point in the history
  118. [flang] Add reduction semantics to fir.do_loop (llvm#93934)

    Derived from llvm#92480. This PR introduces reduction semantics into loops
    for DO CONCURRENT REDUCE. The `fir.do_loop` operation now invisibly has
    the `operandSegmentsizes` attribute and takes variable-length reduction
    operands with their operations given as `fir.reduce_attr`. For the sake
    of compatibility, `fir.do_loop`'s builder has additional arguments at
    the end. The `iter_args` operand should be placed in front of the
    declaration of result types, so the new operand for reduction variables
    (`reduce`) is put in the middle of arguments.
    khaki3 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    fea5d46 View commit details
    Browse the repository at this point in the history
  119. [libc][FixedVector] Add more helper methods (llvm#94278)

    This adds:
    - A ctor accepting a start and end iterator
    - A ctor accepting a count and const T&
    - size()
    - subscript operators
    - begin() and end() iterators
    PiJoules authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3d79450 View commit details
    Browse the repository at this point in the history
  120. [clang][NFC] fix name lookup for llvm::json::Value in SymbolGraphSeri…

    …alizer (llvm#94511)
    
    This code uses namespaces `llvm` and `llvm::json`. However, we have both
    `llvm::Value` and `llvm::json::Value`. Whenever any of the headers
    declare or include `llvm::Value`, the lookup becomes ambiguous.
    
    Fixing this by qualifying the `Value` type.
    yuxuanchen1997 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    993247b View commit details
    Browse the repository at this point in the history
  121. [memprof] Use std::unique_ptr instead of std::optional (llvm#94655)

    Changing the type of Frame::SymbolName from std::optional<std::string>
    to std::unique<std::string> reduces sizeof(Frame) from 64 to 32.
    
    The smaller type reduces the cycle and instruction counts by 23% and
    4.4%, respectively, with "llvm-profdata show" modified to deserialize
    all MemProfRecords in a MemProf V2 profile.  The peak memory usage is
    cut down nearly by half.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    19fd58a View commit details
    Browse the repository at this point in the history
  122. Configuration menu
    Copy the full SHA
    580a488 View commit details
    Browse the repository at this point in the history
  123. [ELF] Keep non-alloc orphan sections at the end

    https://reviews.llvm.org/D85867 changed the way we assign file offsets
    (alloc sections first, then non-alloc sections).
    
    It also removed a non-alloc special case from `findOrphanPos`.
    Looking at the memory-nonalloc-no-warn.test change, which would be
    needed by llvm#93761, it makes sense to restore the previous behavior: when
    placing non-alloc orphan sections, keep these sections at the end so
    that the section index order matches the file offset order.
    
    This change is cosmetic. In sections-nonalloc.s, GNU ld places the
    orphan `other3` in the middle and the orphan .symtab/.shstrtab/.strtab
    at the end.
    
    Pull Request: llvm#94519
    MaskRay authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7e5a6ca View commit details
    Browse the repository at this point in the history
  124. [bazel] Port for 649edb8

    hokein authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5a31f5e View commit details
    Browse the repository at this point in the history
  125. [lldb/crashlog] Remove aarch64 requirement on crashlog tests (llvm#94553

    )
    
    This PR removes the `target-aarch64` requirement on the crashlog tests
    to exercice them on Intel bots and make image loading single-threaded
    temporarily while implementing a fix for a deadlock issue when loading
    the images in parallel.
    
    Signed-off-by: Med Ismail Bennani <[email protected]>
    medismailben authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    c7b1d07 View commit details
    Browse the repository at this point in the history
  126. [Offload] Fix missing abs function for test

    Summary:
    We don't have the abs function to link against, just use the builtin.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    215c9f1 View commit details
    Browse the repository at this point in the history
  127. [LLVM] Do not require shell for some tests (llvm#94595)

    Remove `REQUIRES: shell` from some tests that seem fine without it.
    Tested on Windows and with LIT_USE_INTERNAL_SHELL=1 on Linux.
    jayfoad authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    91bf0b0 View commit details
    Browse the repository at this point in the history
  128. NFC: resolve TODO in LLVM dialect conversions (llvm#91497)

    Relaxes restriction that certain public utility functions only apply
    to the builtin ModuleOp.
    christopherbate authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    dae0098 View commit details
    Browse the repository at this point in the history
  129. [lldb] Include memory stats in statistics summary (llvm#94671)

    The summary already includes other size information, e.g. total debug
    info size in bytes. The only other way I can get this information is by
    dumping all statistics which can be quite large. Adding it to the
    summary seems fair.
    bulbazord authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    093ff68 View commit details
    Browse the repository at this point in the history
  130. [Offload] Use the kernel argument size directly in AMDGPU offloading (l…

    …lvm#94667)
    
    Summary:
    The old COV3 implementation of HSA used to omit the implicit arguments
    from the kernel argument size. For COV4 and COV5 this is no longer the
    case so we can simply use the size reported from the symbol information.
    
    See
    ROCm/ROCR-Runtime#117 (comment)
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    fc0f7bf View commit details
    Browse the repository at this point in the history
  131. [RISCV][InsertVSETVLI] Check for undef register operand directly [nfc]

    getVNInfoFromReg is expected to return a nullptr if-and-only-if the
    operand is undef.  (This was asserted for.)  Reverse the order of the
    checks to simplify an upcoming set of patches.
    preames authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    91b0814 View commit details
    Browse the repository at this point in the history
  132. Configuration menu
    Copy the full SHA
    3df6c97 View commit details
    Browse the repository at this point in the history
  133. [ProfileData] Remove swapToHostOrder (llvm#94665)

    This patch removes swapToHostOrder in favor of
    llvm::support::endian::readNext as swapToHostOrder is too thin a
    wrapper around readNext.
    
    Note that there are two variants of readNext:
    
    - readNext<type, endian, align>(ptr)
    - readNext<type, align>(ptr, endian)
    
    swapToHostOrder uses the former, but this patch switches to the latter.
    
    While we are at it, this patch teaches readNext to default to
    unaligned just as I did in:
    
      commit 568368a
      Author: Kazu Hirata <[email protected]>
      Date:   Mon Apr 15 19:05:30 2024 -0700
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    d7fddb4 View commit details
    Browse the repository at this point in the history
  134. [clang] Fix flag typo in comment

    Fixed for more accurate searches of the flag `-Wsystem-headers-in-module=`.
    kastiglione authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    898bd85 View commit details
    Browse the repository at this point in the history
  135. [memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> …

    …(NFC) (llvm#94432)
    
    This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>.
    
    llvm::SmallVector<Frame> sets aside one inline element.  Meanwhile,
    when I sort all call stacks by their lengths, the length at the first
    percentile is already 2.  That is, 99 percent of call stacks do not
    take advantage of the inline element.
    
    Using std::vector<Frame> reduces the cycle and instruction counts by
    11% and 22%, respectively, with "llvm-profdata show" modified to
    deserialize all MemProfRecords.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    15701c1 View commit details
    Browse the repository at this point in the history
  136. Configuration menu
    Copy the full SHA
    66026f0 View commit details
    Browse the repository at this point in the history
  137. Configuration menu
    Copy the full SHA
    ff9f8a7 View commit details
    Browse the repository at this point in the history
  138. [libc] fixed target issue with exit_handler (llvm#94678)

    - addressed
    llvm#94317 (comment)
    - added conditional in cmake file for exit_handler object library
    
    Co-authored-by: Aaryan Shukla <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    37a2cc0 View commit details
    Browse the repository at this point in the history
  139. [sanitizer] Make CHECKs in bitvector more precise (NFC) (llvm#94630)

    These CHECKs are all checking indices, which must be strictly smaller
    than the size (otherwise they would go out of bounds).
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f99ee69 View commit details
    Browse the repository at this point in the history
  140. [sanitizer_common] Change allocator base in test case for compatibili… (

    llvm#93234)
    
    …ty with high-entropy ASLR
    
    With high-entropy ASLR (e.g., 32-bits == 16TB), the allocator base of
    0x700000000000 (112TB) may collide with the placement of the libraries
    (e.g., on Linux, the mmap base could be 128TB - 16TB == 112TB). This
    results in a segfault in the test case.
    
    This patch moves the allocator base below the PIE program segment,
    inspired by fb77ca0. As per that patch:
    1) we are leaving the old behavior for Apple 2) since ASLR cannot be set
    above 32-bits for x86-64 Linux, we expect this new layout to be durable.
    
    Note that this is only changing a test case, not the behavior of
    sanitizers. Sanitizers have their own settings for initializing the
    allocator base.
    
    Reproducer:
    1. ninja check-sanitizer # Just to build the test binary needed below;
    no need to actually run the tests here
    2. sudo sysctl vm.mmap_rnd_bits=32 # Increase ASLR entropy
    3. for f in `seq 1 10000`; do echo $f;
    GTEST_FILTER=*SizeClassAllocator64Dense
    ./projects/compiler-rt/lib/sanitizer_common/tests/Sanitizer-x86_64-Test
    > /tmp/x; if [ $? -ne 0 ]; then cat /tmp/x; fi; done
    thurstond authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f09cac8 View commit details
    Browse the repository at this point in the history
  141. [compiler-rt] Map internal_sigaction to __sys_sigaction on FreeBSD (l…

    …lvm#84441)
    
    This function is called during very early startup and which can result
    in a crash on FreeBSD. The sigaction() function in libc is indirected
    via a table so that it can be interposed by the threading library
    rather than calling the syscall directly. In the crash I was observing
    this table had not yet been relocated, so we ended up jumping to an
    invalid address. To avoid this problem we can call __sys_sigaction,
    which calls the syscall directly and in FreeBSD 15 is part of libsys
    rather than libc, so does not depend on libc being fully initialized.
    arichardson authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ec12488 View commit details
    Browse the repository at this point in the history
  142. [llvm/IR] Fix module build issue following e57308b (NFC) (llvm#94580)

    This patch fixes a build issue following e57308b when enabling
    module build.
    
    With that change, we failed to build the LLVM_IR module since
    GEPNoWrapFlags wasn't defined prior to using it.
    
    This patch addressed that issue by including the missing header in
    `llvm/IR/IRBuilderFolder.h` which uses the `GEPNoWrapFlags` type.
    This should ensure that we can always build the `LLVM_IR` module.
    
    Signed-off-by: Med Ismail Bennani <[email protected]>
    
    Signed-off-by: Med Ismail Bennani <[email protected]>
    medismailben authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9905f64 View commit details
    Browse the repository at this point in the history
  143. [memprof] Add CallStackRadixTreeBuilder (llvm#93784)

    Call stacks are a huge portion of the MemProf profile, taking up 70+%
    of the profile file size.
    
    This patch implements a radix tree to compress call stacks, which are
    known to have long common prefixes.  Specifically,
    CallStackRadixTreeBuilder, introduced in this patch, takes call stacks
    in the MemProf profile, sorts them in the dictionary order to maximize
    the common prefix between adjacent call stacks, and then encodes a
    radix tree into a single array that is ready for serialization.
    
    The resulting radix array is essentially a concatenation of call stack
    arrays, each encoded with its length followed by the payload, except
    that these arrays contain "instructions" like "skip 7 elements
    forward" to borrow common prefixes from other call stacks.
    
    This patch does not integrate with the MemProf
    serialization/deserialization infrastructure yet.  Once integrated,
    the radix tree is expected to roughly halve the file size of the
    MemProf profile.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a7c205d View commit details
    Browse the repository at this point in the history
  144. Configuration menu
    Copy the full SHA
    625bd35 View commit details
    Browse the repository at this point in the history
  145. Configuration menu
    Copy the full SHA
    300e13b View commit details
    Browse the repository at this point in the history
  146. [ELF] Improve -r section group tests

    MaskRay authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    31f7172 View commit details
    Browse the repository at this point in the history
  147. Configuration menu
    Copy the full SHA
    ceb964d View commit details
    Browse the repository at this point in the history
  148. [dfsan] Add test case for sscanf (llvm#94700)

    This test case shows a limitation of DFSan's sscanf implementation
    (introduced in https://reviews.llvm.org/D153775): it simply ignores
    ordinary characters in the format string, instead of actually comparing
    them against the input. This may change the semantics of instrumented
    programs.
    
    Importantly, this also means that DFSan's release_shadow_space.c test,
    which relies on sscanf to scrape the RSS from /proc/maps output, will
    incorrectly match lines that don't contain RSS information. As a result,
    it adding together numbers from irrelevant output (e.g., base
    addresses), resulting in test flakiness
    (llvm#91287).
    thurstond authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    720509b View commit details
    Browse the repository at this point in the history
  149. [ELF] Simplify code. NFC

    Make it easier to add CREL support.
    MaskRay authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    906734b View commit details
    Browse the repository at this point in the history
  150. Configuration menu
    Copy the full SHA
    9ab25b2 View commit details
    Browse the repository at this point in the history
  151. Configuration menu
    Copy the full SHA
    f36f6a5 View commit details
    Browse the repository at this point in the history
  152. [InstCombine] Improve coverage of foldSelectValueEquivalence for co…

    …nstants
    
    We don't need the `noundef` check if the new simplification is a
    constant.
    
    This cleans up regressions from folding multiuse:
        `(icmp eq/ne (sub/xor x, y), 0)` -> `(icmp eq/ne x, y)`.
    
    Closes llvm#88298
    goldsteinn authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3865e74 View commit details
    Browse the repository at this point in the history
  153. [RISCV] Unify all the code that adds unaligned-scalar/vector-mem to F…

    …eatures vector. (llvm#94660)
    
    Instead of having multiple places insert into the Features vector
    independently, check all the conditions in one place.
    
    This avoids a subtle ordering requirement that -mstrict-align processing
    had to be done after the others.
    topperc authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b7c3fdb View commit details
    Browse the repository at this point in the history
  154. Configuration menu
    Copy the full SHA
    f14147d View commit details
    Browse the repository at this point in the history
  155. [clang-format]: Annotate colons found in inline assembly (llvm#92617)

    Short-circuit the parsing of tok::colon to label colons found within
    lines starting with asm as InlineASMColon.
    
    Fixes llvm#92616.
    
    ---------
    
    Co-authored-by: Owen Pan <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    1b8d567 View commit details
    Browse the repository at this point in the history
  156. Configuration menu
    Copy the full SHA
    12b9224 View commit details
    Browse the repository at this point in the history
  157. [serialization] no transitive decl change (llvm#92083)

    Following of llvm#86912
    
    The motivation of the patch series is that, for a module interface unit
    `X`, when the dependent modules of `X` changes, if the changes is not
    relevant with `X`, we hope the BMI of `X` won't change. For the specific
    patch, we hope if the changes was about irrelevant declaration changes,
    we hope the BMI of `X` won't change. **However**, I found the patch
    itself is not very useful in practice, since the adding or removing
    declarations, will change the state of identifiers and types in most
    cases.
    
    That said, for the most simple example,
    
    ```
    // partA.cppm
    export module m:partA;
    
    // partA.v1.cppm
    export module m:partA;
    export void a() {}
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    the BMI of `onlyUseB` will change after we change the implementation of
    `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
    identifiers and types (the function prototype).
    
    So in this patch, we have to write the tests as:
    
    ```
    // partA.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA2(int) { ... }
    
    // partA.v1.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA(int) { ... }
    export int getA2(int) { ... }
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    so that the new introduced declaration `int getA(int)` doesn't introduce
    new identifiers and types, then the BMI of `onlyUseB` can keep
    unchanged.
    
    While it looks not so great, the patch should be the base of the patch
    to erase the transitive change for identifiers and types since I don't
    know how can we introduce new types and identifiers without introducing
    new declarations. Given how tightly the relationship between
    declarations, types and identifiers, I think we can only reach the ideal
    state after we made the series for all of the three entties.
    
    The design of the patch is similar to
    llvm#86912, which extends the
    32-bit DeclID to 64-bit and use the higher bits to store the module file
    index and the lower bits to store the Local Decl ID.
    
    A slight difference is that we only use 48 bits to store the new DeclID
    since we try to use the higher 16 bits to store the module ID in the
    prefix of Decl class. Previously, we use 32 bits to store the module ID
    and 32 bits to store the DeclID. I don't want to allocate additional
    space so I tried to make the additional space the same as 64 bits. An
    potential interesting thing here is about the relationship between the
    module ID and the module file index. I feel we can get the module file
    index by the module ID. But I didn't prove it or implement it. Since I
    want to make the patch itself as small as possible. We can make it in
    the future if we want.
    
    Another change in the patch is the new concept Decl Index, which means
    the index of the very big array `DeclsLoaded` in ASTReader. Previously,
    the index of a loaded declaration is simply the Decl ID minus
    PREDEFINED_DECL_NUMs. So there are some places they got used
    ambiguously. But this patch tried to split these two concepts.
    
    As llvm#86912 did, the change will
    increase the on-disk PCM file sizes. As the declaration ID may be the
    most IDs in the PCM file, this can have the biggest impact on the size.
    In my experiments, this change will bring 6.6% increase of the on-disk
    PCM size. No compile-time performance regression observed. Given the
    benefits in the motivation example, I think the cost is worthwhile.
    ChuanqiXu9 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    59bed9c View commit details
    Browse the repository at this point in the history
  158. [llvm][ScheduleDAG] Re-arrange SUnit's members to make it smaller (ll…

    …vm#94547)
    
    before:
    ```
    *** Dumping AST Record Layout
             0 | class llvm::SUnit
             0 |   SDNode * Node
             8 |   MachineInstr * Instr
            16 |   SUnit * OrigNode
            24 |   const MCSchedClassDesc * SchedClass
            32 |   class llvm::SmallVector<class llvm::SDep, 4> Preds
            32 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
            32 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
            32 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
            32 |           class llvm::SmallVectorBase<uint32_t> (base)
            32 |             void * BeginX
            40 |             unsigned int Size
            44 |             unsigned int Capacity
            48 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
            48 |       char[64] InlineElts
           112 |   class llvm::SmallVector<class llvm::SDep, 4> Succs
           112 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
           112 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
           112 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
           112 |           class llvm::SmallVectorBase<uint32_t> (base)
           112 |             void * BeginX
           120 |             unsigned int Size
           124 |             unsigned int Capacity
           128 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
           128 |       char[64] InlineElts
           192 |   unsigned int NodeNum
           196 |   unsigned int NodeQueueId
           200 |   unsigned int NumPreds
           204 |   unsigned int NumSuccs
           208 |   unsigned int NumPredsLeft
           212 |   unsigned int NumSuccsLeft
           216 |   unsigned int WeakPredsLeft
           220 |   unsigned int WeakSuccsLeft
           224 |   unsigned short NumRegDefsLeft
           226 |   unsigned short Latency
       228:0-0 |   _Bool isVRegCycle
       228:1-1 |   _Bool isCall
       228:2-2 |   _Bool isCallOp
       228:3-3 |   _Bool isTwoAddress
       228:4-4 |   _Bool isCommutable
       228:5-5 |   _Bool hasPhysRegUses
       228:6-6 |   _Bool hasPhysRegDefs
       228:7-7 |   _Bool hasPhysRegClobbers
       229:0-0 |   _Bool isPending
       229:1-1 |   _Bool isAvailable
       229:2-2 |   _Bool isScheduled
       229:3-3 |   _Bool isScheduleHigh
       229:4-4 |   _Bool isScheduleLow
       229:5-5 |   _Bool isCloned
       229:6-6 |   _Bool isUnbuffered
       229:7-7 |   _Bool hasReservedResource
           232 |   Sched::Preference SchedulingPref
       236:0-0 |   _Bool isDepthCurrent
       236:1-1 |   _Bool isHeightCurrent
           240 |   unsigned int Depth
           244 |   unsigned int Height
           248 |   unsigned int TopReadyCycle
           252 |   unsigned int BotReadyCycle
           256 |   const TargetRegisterClass * CopyDstRC
           264 |   const TargetRegisterClass * CopySrcRC
               | [sizeof=272, dsize=272, align=8,
               |  nvsize=272, nvalign=8]
    ```
    
    after:
    ```
    *** Dumping AST Record Layout
             0 | class llvm::SUnit
             0 |   union llvm::SUnit::(anonymous at /Users/jonathan_roelofs/llvm-upstream/llvm/include/llvm/CodeGen/ScheduleDAG.h:246:5)
             0 |     SDNode * Node
             0 |     MachineInstr * Instr
             8 |   SUnit * OrigNode
            16 |   const MCSchedClassDesc * SchedClass
            24 |   const TargetRegisterClass * CopyDstRC
            32 |   const TargetRegisterClass * CopySrcRC
            40 |   class llvm::SmallVector<class llvm::SDep, 4> Preds
            40 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
            40 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
            40 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
            40 |           class llvm::SmallVectorBase<uint32_t> (base)
            40 |             void * BeginX
            48 |             unsigned int Size
            52 |             unsigned int Capacity
            56 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
            56 |       char[64] InlineElts
           120 |   class llvm::SmallVector<class llvm::SDep, 4> Succs
           120 |     class llvm::SmallVectorImpl<class llvm::SDep> (base)
           120 |       class llvm::SmallVectorTemplateBase<class llvm::SDep> (base)
           120 |         class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base)
           120 |           class llvm::SmallVectorBase<uint32_t> (base)
           120 |             void * BeginX
           128 |             unsigned int Size
           132 |             unsigned int Capacity
           136 |     struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base)
           136 |       char[64] InlineElts
           200 |   unsigned int NodeNum
           204 |   unsigned int NodeQueueId
           208 |   unsigned int NumPreds
           212 |   unsigned int NumSuccs
           216 |   unsigned int NumPredsLeft
           220 |   unsigned int NumSuccsLeft
           224 |   unsigned int WeakPredsLeft
           228 |   unsigned int WeakSuccsLeft
           232 |   unsigned int TopReadyCycle
           236 |   unsigned int BotReadyCycle
           240 |   unsigned int Depth
           244 |   unsigned int Height
       248:0-0 |   _Bool isVRegCycle
       248:1-1 |   _Bool isCall
       248:2-2 |   _Bool isCallOp
       248:3-3 |   _Bool isTwoAddress
       248:4-4 |   _Bool isCommutable
       248:5-5 |   _Bool hasPhysRegUses
       248:6-6 |   _Bool hasPhysRegDefs
       248:7-7 |   _Bool hasPhysRegClobbers
       249:0-0 |   _Bool isPending
       249:1-1 |   _Bool isAvailable
       249:2-2 |   _Bool isScheduled
       249:3-3 |   _Bool isScheduleHigh
       249:4-4 |   _Bool isScheduleLow
       249:5-5 |   _Bool isCloned
       249:6-6 |   _Bool isUnbuffered
       249:7-7 |   _Bool hasReservedResource
           250 |   unsigned short NumRegDefsLeft
           252 |   unsigned short Latency
       254:0-0 |   _Bool isDepthCurrent
       254:1-1 |   _Bool isHeightCurrent
       254:2-2 |   _Bool isNode
       254:3-3 |   _Bool isInst
       254:4-7 |   Sched::Preference SchedulingPref
               | [sizeof=256, dsize=255, align=8,
               |  nvsize=255, nvalign=8]
    ```
    jroelofs authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    c5b5dcd View commit details
    Browse the repository at this point in the history
  159. Revert "[serialization] no transitive decl change (llvm#92083)"

    This reverts commit 5c10487.
    
    The ArmV7 bot is complaining the change breaks the alignment.
    ChuanqiXu9 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b0281f1 View commit details
    Browse the repository at this point in the history
  160. [AMDGPU] Auto-generating lit test patterns (NFC) (llvm#93837)

    Test CodeGen/AMDGPU/build_vector.ll has the lit patterns partially
    hand-written and the rest auto-generated. It doesn't look good when
    changes are required with future patches. Auto-generating the entire
    pattern. Moved out the R600 test into build_vector-r600.ll.
    cdevadas authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    09ce478 View commit details
    Browse the repository at this point in the history
  161. [AMDGPU] Auto-generated some lit test patterns (NFC). (llvm#94310)

    Also, converted the R600 RUN lines from some tests into standalone tests.
    cdevadas authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    634fbfb View commit details
    Browse the repository at this point in the history
  162. [NewPM][CodeGen] Port regallocfast to new pass manager (llvm#94426)

    This pull request port `regallocfast` to new pass manager. It exposes
    the parameter `filter` to handle different register classes for AMDGPU.
    IIUC AMDGPU need to allocate different register classes separately so it
    need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
    `-passe=regallocfast<filter=sgpr>` to allocate specific register class.
    The command line option `--regalloc-npm` is still in work progress, plan
    to reuse the syntax of passes, e.g. use
    `--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
    replace `--sgpr-regalloc` and `--vgpr-regalloc`.
    paperchalice authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    eb3090e View commit details
    Browse the repository at this point in the history
  163. Configuration menu
    Copy the full SHA
    638074f View commit details
    Browse the repository at this point in the history
  164. [test] Don't generate regalloc-amdgpu.s in llvm#94426 (llvm#94722)

    The test will generate an empty `regalloc-amdgpu.s` file in test, which
    causes an unresolved test.
    paperchalice authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9928aa4 View commit details
    Browse the repository at this point in the history
  165. [clang-tidy] refactor misc-header-include-cycle (llvm#94697)

    1. merge valid check
    2. use range base loop
    HerrCai0907 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    95f34a7 View commit details
    Browse the repository at this point in the history
  166. Configuration menu
    Copy the full SHA
    e085ae5 View commit details
    Browse the repository at this point in the history
  167. Fix spurious non-strict availability warning (llvm#94377)

    The availability attributes are stored on the function declarations. The
    code was looking for them in the function template declarations. This
    resulted in spuriously diagnosing (non-strict) availablity issues in
    contexts that are not available.
    
    Co-authored-by: Gabor Horvath <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a65c853 View commit details
    Browse the repository at this point in the history
  168. [mlir][tensor] Fix FoldTensorCastProducerOp for multiple result opera…

    …tions (llvm#93374)
    
    For patterns where there are multiple results apart from dpsInits, this
    fails.
    E.g.:
    ```
    %13:2 = iree_codegen.ukernel.generic "iree_uk_unpack"
    ins(%extracted_slice : tensor<?x1x16x16xf32>) outs(%11 :
    tensor<?x?xf32>) ... -> tensor<?x?xf32>, i32
    ``` 
    The above op has results apart from dpsInit and hence fails. The PR
    assumes that the result has dpsInits followed by nonDpsInits.
    pashu123 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5694e29 View commit details
    Browse the repository at this point in the history
  169. Configuration menu
    Copy the full SHA
    e8ac511 View commit details
    Browse the repository at this point in the history
  170. [clang][Interp] Improve APValue machinery

    Handle lvalues pointing to declarations, unions and member pointers.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    da152a0 View commit details
    Browse the repository at this point in the history
  171. Configuration menu
    Copy the full SHA
    47e0905 View commit details
    Browse the repository at this point in the history
  172. [lldb] Split ValueObject::CreateChildAtIndex into two functions (llvm…

    …#94455)
    
    The the function is doing two fairly different things, depending on how
    it is called. While this allows for some code reuse, it also makes it
    hard to override it correctly. Possibly for this reason
    ValueObjectSynthetic overerides GetChildAtIndex instead, which forces it
    to reimplement some of its functionality, most notably caching of
    generated children.
    
    Splitting this up makes it easier to move the caching to a common place
    (and hopefully makes the code easier to follow in general).
    labath authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e47cb50 View commit details
    Browse the repository at this point in the history
  173. Configuration menu
    Copy the full SHA
    908d925 View commit details
    Browse the repository at this point in the history
  174. [memprof] Use std::move in ContextEdge::ContextEdge (NFC) (llvm#94687)

    Since the constructor of ContextEdge takes ContextIds by value, we
    should move it to the corresponding member variable as suggested by
    clang-tidy's performance-unnecessary-value-param.
    
    While we are at it, this patch updates a couple of callers.  To avoid
    the ambiguity in the evaluation order among the constructor arguments,
    I'm calling computeAllocType before calling the constructor.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ed0d45e View commit details
    Browse the repository at this point in the history
  175. [ORC] Switch ExecutionSession::ErrorReporter to use unique_function.

    This allows the ReportError functor to hold move-only types.
    lhames authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    4d849a4 View commit details
    Browse the repository at this point in the history
  176. Configuration menu
    Copy the full SHA
    137038f View commit details
    Browse the repository at this point in the history
  177. [SCEV] Use insert_or_assign() (NFC)

    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b0d738c View commit details
    Browse the repository at this point in the history
  178. [LoongArch] Add a pass to rewrite rd to r0 for non-computational inst…

    …rs whose return values are unused (llvm#94590)
    
    This patch adds a peephole pass `LoongArchDeadRegisterDefinitions`. It
    rewrites `rd` to `r0` when `rd` is marked as dead. It may improve the
    register allocation and reduce pipeline hazards on CPUs without register
    renaming and OOO.
    heiher authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9516710 View commit details
    Browse the repository at this point in the history
  179. [clang][Interp][NFC] Add GetPtrFieldPop opcode

    And change the previous GetPtrField to only peek() the base pointer.
    We can get rid of a whole bunch of DupPtr ops this way.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    6fc4e97 View commit details
    Browse the repository at this point in the history
  180. [analyzer][NFC] Factor out NoOwnershipChangeVisitor (llvm#94357)

    In preparation for adding essentially the same visitor to StreamChecker,
    this patch factors this visitor out to a common header.
    
    I'll be the first to admit that the interface of these classes are not
    terrific, but it rather tightly held back by its main technical debt,
    which is NoStoreFuncVisitor, the main descendant of
    NoStateChangeVisitor.
    
    Change-Id: I99d73ccd93a18dd145bbbc83afadbb432dd42b90
    Szelethus authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7375a39 View commit details
    Browse the repository at this point in the history
  181. Configuration menu
    Copy the full SHA
    ea0fcca View commit details
    Browse the repository at this point in the history
  182. [docs] Fix benchmarking tips (llvm#94724)

    This PR fixes an incorrect line for setting scaling_governer in
    benchmarking tips.
    maekawatoshiki authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5a0978c View commit details
    Browse the repository at this point in the history
  183. Configuration menu
    Copy the full SHA
    711196a View commit details
    Browse the repository at this point in the history
  184. [clang][Interp] Remove StoragKind limitation in Pointer assign operators

    It's not strictly needed and did cause some test failures.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b1fafc4 View commit details
    Browse the repository at this point in the history
  185. Configuration menu
    Copy the full SHA
    c0635ee View commit details
    Browse the repository at this point in the history
  186. [MLIR] Translate DIStringType. (llvm#94480)

    This PR handle translation of DIStringType. Mostly mechanical changes to
    translate DIStringType to/from DIStringTypeAttr. The 'stringLength'
    field is 'DIVariable' in DIStringType. As there was no `DIVariableAttr`
    previously, it has been added to ease the translation.
    
    ---------
    
    Co-authored-by: Tobias Gysi <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e336acf View commit details
    Browse the repository at this point in the history
  187. Configuration menu
    Copy the full SHA
    08bf183 View commit details
    Browse the repository at this point in the history
  188. [flang][Transforms][NFC] Remove boilerplate from vscale range pass (l…

    …lvm#94598)
    
    Use tablegen to generate the pass constructor.
    
    This pass is supposed to add function attributes so it does not need to
    operate on other top level operations.
    tblah authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7db1232 View commit details
    Browse the repository at this point in the history
  189. Configuration menu
    Copy the full SHA
    7ce3900 View commit details
    Browse the repository at this point in the history
  190. Configuration menu
    Copy the full SHA
    8d54dc1 View commit details
    Browse the repository at this point in the history
  191. [ARM] Add NEON support for ISD::ABDS/ABDU nodes. (llvm#94504)

    As noted on llvm#94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns.
    
    This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead.
    
    Fixes llvm#94466
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    538584d View commit details
    Browse the repository at this point in the history
  192. Configuration menu
    Copy the full SHA
    a74cf9d View commit details
    Browse the repository at this point in the history
  193. [DebugInfo] Add DW_OP_LLVM_extract_bits (llvm#93990)

    This operation extracts a number of bits at a given offset and sign or
    zero extends them, which is done by emitting it as a left shift followed
    by a right shift.
    
    This is being added for use in clang for C++ structured bindings of
    bitfields that have offset or size that aren't a byte multiple. A new
    operation is being added, instead of shifts being used directly, as it
    makes correctly handling it in optimisations (which will be done in a
    later patch) much easier.
    john-brawn-arm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e173fa7 View commit details
    Browse the repository at this point in the history
  194. Add checks before hoisting out in loop pipelining (llvm#90872)

    Currently, during a loop pipelining transformation, operations may be
    hoisted out without any checks on the loop bounds, which leads to
    incorrect transformations and unexpected behaviour. The following [issue
    ](llvm#90870) describes the
    problem more extensively, including an example.
    The proposed fix adds some check in the loop bounds before and applies
    the maximum hoisting.
    fotiskoun authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b37c6bd View commit details
    Browse the repository at this point in the history
  195. Configuration menu
    Copy the full SHA
    85cbf2f View commit details
    Browse the repository at this point in the history
  196. [clang][Interp] Fix refers_to_enclosing_variable_or_capture DREs

    They do not count into lambda captures, so visit them lazily.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f0fde2b View commit details
    Browse the repository at this point in the history
  197. [SimplifyCFG] Remove bogus UTC line from test (NFC)

    The check lines in this test were clearly not generated by UTC.
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    66dad78 View commit details
    Browse the repository at this point in the history
  198. [SimplifyCFG] Regenerate switch to lookup tests (NFC)

    Regenerate these with --check-globals. The manual global CHECKS
    get dropped during regeneration otherwise.
    
    Annoyingly UTC insists on putting the globals directly before the
    first function, so the first comment is a bit out of place now.
    nikic authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    405d7d5 View commit details
    Browse the repository at this point in the history
  199. [mlir][vector] Add n-d deinterleave lowering (llvm#94237)

    This patch implements the lowering for vector
    deinterleave for vector of n-dimensions. Process
    involves unrolling the n-d vector to a series
    of one-dimensional vectors. The deinterleave
    operation is then used on these vectors.
    
    From:
    ```
    %0, %1 = vector.deinterleave %a : vector<2x8xi8> -> vector<2x4xi8>
    ```
    
    To:
    ```
    %cst = arith.constant dense<0> : vector<2x4xi32>
    %0 = vector.extract %arg0[0] : vector<8xi32> from vector<2x8xi32>
    %res1, %res2 = vector.deinterleave %0 : vector<8xi32> -> vector<4xi32>
    %1 = vector.insert %res1, %cst [0] : vector<4xi32> into vector<2x4xi32>
    %2 = vector.insert %res2, %cst [0] : vector<4xi32> into vector<2x4xi32>
    %3 = vector.extract %arg0[1] : vector<8xi32> from vector<2x8xi32>
    %res1_0, %res2_1 = vector.deinterleave %3 : vector<8xi32> -> vector<4xi32>
    %4 = vector.insert %res1_0, %1 [1] : vector<4xi32> into vector<2x4xi32>
    %5 = vector.insert %res2_1, %2 [1] : vector<4xi32> into vector<2x4xi32>
    ...etc.
    ```
    mub-at-arm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    afcd18f View commit details
    Browse the repository at this point in the history
  200. [ARM] r11 is reserved when using -mframe-chain=aapcs (llvm#86951)

    When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options,
    we cannot use r11 as an allocatable register, even if
    -fomit-frame-pointer is also used. This is so that r11 will always point
    to a valid frame record, even if we don't create one in every function.
    ostannard authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3f99c0d View commit details
    Browse the repository at this point in the history
  201. [DAG] Always allow folding XOR patterns to ABS pre-legalization (llvm…

    …#94601)
    
    Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    671bcef View commit details
    Browse the repository at this point in the history
  202. fix(mlir/**.py): fix comparison to None (llvm#94019)

    from PEP8
    (https://peps.python.org/pep-0008/#programming-recommendations):
    
    > Comparisons to singletons like None should always be done with is or
    is not, never the equality operators.
    
    Co-authored-by: Eisuke Kawashima <[email protected]>
    2 people authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9ecb812 View commit details
    Browse the repository at this point in the history
  203. [ARM] Add support for Cortex-R52+ (llvm#94633)

    Cortex-R52+ is an Armv8-R AArch32 CPU.
    
    Technical Reference Manual for Cortex-R52+:
       https://developer.arm.com/documentation/102199/latest/
    jthackray authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    0601711 View commit details
    Browse the repository at this point in the history
  204. Configuration menu
    Copy the full SHA
    2fd4477 View commit details
    Browse the repository at this point in the history
  205. [clang][test] Skip interpreter value test on Arm 32 bit

    llvm#89811 caused this test to fail,
    somehow.
    
    I think it may not be at fault, but actually be exposing some
    existing undefined behaviour, see
    llvm#94741.
    
    Skipping this for now to get the bots green again.
    DavidSpickett authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    14cd171 View commit details
    Browse the repository at this point in the history
  206. [gn build] Port e622996

    llvmgnsyncbot authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    126837c View commit details
    Browse the repository at this point in the history
  207. Configuration menu
    Copy the full SHA
    7f3b593 View commit details
    Browse the repository at this point in the history
  208. [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (llvm#89796)

    This change seeks to add support for vendor flavoured SPIRV - more
    specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
    carries some extra bits of information that are only usable by AMDGCN
    targets, forfeiting absolute genericity to obtain greater expressiveness
    for target features:
    
    - AMDGCN inline ASM is allowed/supported, under the assumption that the
    [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
    extension is enabled/used
    - AMDGCN target specific builtins are allowed/supported, under the
    assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
    enabled when using the downstream translator
    - the featureset matches the union of AMDGCN targets' features
    - the datalayout string is overspecified to affix both the program
    address space and the alloca address space, the latter under the
    assumption that the
    [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
    extension is enabled/used, case in which the extant SPIRV datalayout
    string would lead to pointers to function pointing to the private
    address space, which would be wrong.
    
    Existing AMDGCN tests are extended to cover this new target. It is
    currently dormant / will require some additional changes, but I thought
    I'd rather put it up for review to get feedback as early as possible. I
    will note that an alternative option is to place this under AMDGPU, but
    that seems slightly less natural, since this is still SPIRV, albeit
    relaxed in terms of preconditions & constrained in terms of
    postconditions, and only guaranteed to be usable on AMDGCN targets (it
    is still possible to obtain pristine portable SPIRV through usage of the
    flavoured target, though).
    AlexVlx authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    4e4eb43 View commit details
    Browse the repository at this point in the history
  209. [BOLT][NFC] Infailable fns return void (llvm#92018)

    Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure.
    
    Thus, just return nothing.
    urnathan authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    9c42b20 View commit details
    Browse the repository at this point in the history
  210. [CodeGen][SDAG] Remove CombinedNodes SmallPtrSet (llvm#94609)

    This "small" set grows quite large and it's more performant to store
    whether a node has been combined before in the node itself.
    
    As this information is only relevant for nodes that are currently not in
    the worklist, add a second state to the CombinerWorklistIndex (-2) to
    indicate that a node is currently not in a worklist, but was combined
    before.
    
    This brings a substantial performance improvement.
    aengelke authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    35fbc3f View commit details
    Browse the repository at this point in the history
  211. [clang][Interp] Check ConstantExpr results for initialization

    They need to be fully initialized, similar to global variables.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    6db6f7e View commit details
    Browse the repository at this point in the history
  212. Configuration menu
    Copy the full SHA
    c1a3bf7 View commit details
    Browse the repository at this point in the history
  213. [clang][Interp] Limit lambda capture lazy visting to actual captures

    Check this by looking at the VarDecl.
    tbaederr authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a647101 View commit details
    Browse the repository at this point in the history
  214. [serialization] no transitive decl change (llvm#92083)

    Following of llvm#86912
    
    The motivation of the patch series is that, for a module interface unit
    `X`, when the dependent modules of `X` changes, if the changes is not
    relevant with `X`, we hope the BMI of `X` won't change. For the specific
    patch, we hope if the changes was about irrelevant declaration changes,
    we hope the BMI of `X` won't change. **However**, I found the patch
    itself is not very useful in practice, since the adding or removing
    declarations, will change the state of identifiers and types in most
    cases.
    
    That said, for the most simple example,
    
    ```
    // partA.cppm
    export module m:partA;
    
    // partA.v1.cppm
    export module m:partA;
    export void a() {}
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    the BMI of `onlyUseB` will change after we change the implementation of
    `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
    identifiers and types (the function prototype).
    
    So in this patch, we have to write the tests as:
    
    ```
    // partA.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA2(int) { ... }
    
    // partA.v1.cppm
    export module m:partA;
    export int getA() { ... }
    export int getA(int) { ... }
    export int getA2(int) { ... }
    
    // partB.cppm
    export module m:partB;
    export void b() {}
    
    // m.cppm
    export module m;
    export import :partA;
    export import :partB;
    
    // onlyUseB;
    export module onlyUseB;
    import m;
    export inline void onluUseB() {
        b();
    }
    ```
    
    so that the new introduced declaration `int getA(int)` doesn't introduce
    new identifiers and types, then the BMI of `onlyUseB` can keep
    unchanged.
    
    While it looks not so great, the patch should be the base of the patch
    to erase the transitive change for identifiers and types since I don't
    know how can we introduce new types and identifiers without introducing
    new declarations. Given how tightly the relationship between
    declarations, types and identifiers, I think we can only reach the ideal
    state after we made the series for all of the three entties.
    
    The design of the patch is similar to
    llvm#86912, which extends the
    32-bit DeclID to 64-bit and use the higher bits to store the module file
    index and the lower bits to store the Local Decl ID.
    
    A slight difference is that we only use 48 bits to store the new DeclID
    since we try to use the higher 16 bits to store the module ID in the
    prefix of Decl class. Previously, we use 32 bits to store the module ID
    and 32 bits to store the DeclID. I don't want to allocate additional
    space so I tried to make the additional space the same as 64 bits. An
    potential interesting thing here is about the relationship between the
    module ID and the module file index. I feel we can get the module file
    index by the module ID. But I didn't prove it or implement it. Since I
    want to make the patch itself as small as possible. We can make it in
    the future if we want.
    
    Another change in the patch is the new concept Decl Index, which means
    the index of the very big array `DeclsLoaded` in ASTReader. Previously,
    the index of a loaded declaration is simply the Decl ID minus
    PREDEFINED_DECL_NUMs. So there are some places they got used
    ambiguously. But this patch tried to split these two concepts.
    
    As llvm#86912 did, the change will
    increase the on-disk PCM file sizes. As the declaration ID may be the
    most IDs in the PCM file, this can have the biggest impact on the size.
    In my experiments, this change will bring 6.6% increase of the on-disk
    PCM size. No compile-time performance regression observed. Given the
    benefits in the motivation example, I think the cost is worthwhile.
    ChuanqiXu9 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a9b37d7 View commit details
    Browse the repository at this point in the history
  215. [AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (llvm#…

    …93680)
    
    Whole quad mode requires inserting a copy of the initial EXEC mask. In a
    function that also uses llvm.amdgcn.init.exec, insert the COPY after
    initializing EXEC.
    jayfoad authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    a0dcaf2 View commit details
    Browse the repository at this point in the history
  216. [Frontend][OpenMP] Sort all the things in OMP.td, NFC (llvm#94653)

    The file OMP.td is becoming tedious to update by hand due to the
    seemingly random ordering of various items in it. This patch brings
    order to it by sorting most of the contents.
    
    The clause definitions are sorted alphabetically with respect to the
    spelling of the clause.[1]
    
    The directive definitions are split into two leaf directives and
    compound directives.[2] Within each, definitions are sorted
    alphabetically with respect to the spelling, with the exception that
    "end xyz" directives are placed immediately following the definition of
    "xyz".[3]
    
    Within each directive definition, the lists of clauses are also sorted
    alphabetically.
    
    [1] All spellings are made of lowercase letters, _, or space. Ordering
    that includes non-letters follows the order assumed by the `sort`
    utility.
    [2] Compound directives refer to the consituent leaf directives, hence
    the leaf definitions must come first.
    [3] Some of the "end xyz" directives have properties derived from the
    corresponding "xyz" directive. This exception guarantees that "xyz"
    precedes the "end xyz".
    kparzysz authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    01be0a3 View commit details
    Browse the repository at this point in the history
  217. [flang][OpenMP] Lower target .. private(..) to omp.private ops (l…

    …lvm#94195)
    
    Extends delayed privatization support to `taraget .. private(..)`. With
    this PR, `private` is support for `target` **only** is delayed
    privatization mode.
    ergawy authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    26ba412 View commit details
    Browse the repository at this point in the history
  218. [libc] Correctly pass the C++ standard to NVPTX internal builds

    Summary:
    The NVPTX build wasn't getting the `C++20` standard necessary for a few
    files.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    4407e67 View commit details
    Browse the repository at this point in the history
  219. [mlir][linalg] Support lowering unpack with outer_dims_perm (llvm#94477)

    This commit adds support for lowering `tensor.unpack` with a
    non-identity `outer_dims_perm`. This was previously left as a
    not-yet-implemented case.
    ryan-holt-1 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    df12b11 View commit details
    Browse the repository at this point in the history
  220. [mlir] Add reshape propagation patterns for tensor.pad (llvm#94489)

    This PR adds fusion by collapsing and fusion by expansion patterns for
    `tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or
    collapsed as long as none of the padded dimensions will be expanded or
    collapsed.
    Max191 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    f0cdc72 View commit details
    Browse the repository at this point in the history
  221. [mlir] Fix bugs in expand_shape patterns after semantics changes (llv…

    …m#94631)
    
    After the `output_shape` field was added to `expand_shape` ops,
    dynamically sized expand shapes are now possible, but this was not
    accounted for in the folder. This PR tightens the constraints of the
    folder to fix this.
    Max191 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e4f8c4e View commit details
    Browse the repository at this point in the history
  222. [ARM] Clean up neon_vabd.ll, vaba.ll and vabd.ll tests a bit. NFC

    Change the target triple to remove some unnecessary instructions.
    davemgreen authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b9d3565 View commit details
    Browse the repository at this point in the history
  223. [arm64] Add tan intrinsic lowering (llvm#94545)

    This change is an implementation of
    llvm#87367 investigation on
    supporting IEEE math operations as intrinsics.
    Which was discussed in this RFC:
    https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
    
    This PR is just for Tan.
    
    Now that x86 tan backend landed:
    llvm#90503 we can add other
    backends since the shared pieces are in tree now.
    
    Changes:
    - `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for
    arm64 backends.
    - `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall
    table
    - `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion
    for f128, f16, and vector\neon operations
    - `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define
    `G_FTAN` as a legal arm64 instruction
    
    resolves llvm#94755
    farzonl authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    678428a View commit details
    Browse the repository at this point in the history
  224. Configuration menu
    Copy the full SHA
    50bec57 View commit details
    Browse the repository at this point in the history
  225. [Clang] Add timeout for GPU detection utilities (llvm#94751)

    Summary:
    The utilities `nvptx-arch` and `amdgpu-arch` are used to support
    `--offload-arch=native` among other utilities in clang. However, these
    rely on the GPU drivers to query the features. In certain cases these
    drivers can become locked up, which will lead to indefinate hangs on any
    compiler jobs running in the meantime.
    
    This patch adds a ten second timeout period for these utilities before
    it kills the job and errors out.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    86dd2c9 View commit details
    Browse the repository at this point in the history
  226. Configuration menu
    Copy the full SHA
    1b239ca View commit details
    Browse the repository at this point in the history
  227. [MachineOutliner] Sort by Benefit to Cost Ratio (llvm#90264)

    This PR depends on llvm#90260
    
    We changed the order in which functions are outlined in Machine
    Outliner.
    
    The formula for priority is found via a black-box Bayesian optimization
    toolbox. Using this formula for sorting consistently reduces the
    uncompressed size of large real-world mobile apps. We also ran a few
    benchmarks using LLVM test suites, and showed that sorting by priority
    consistently reduces the text segment size.
    
    |run (CTMark/)   |baseline (1)|priority (2)|diff (1 -> 2)|
    |----------------|------------|------------|-------------|
    |lencod          |349624      |349264      |-0.1030%     |
    |SPASS           |219672      |219480      |-0.0874%     |
    |kc              |271956      |251200      |-7.6321%     |
    |sqlite3         |223920      |223708      |-0.0947%     |
    |7zip-benchmark  |405364      |402624      |-0.6759%     |
    |bullet          |139820      |139500      |-0.2289%     |
    |consumer-typeset|295684      |290196      |-1.8560%     |
    |pairlocalalign  |72236       |72092       |-0.1993%     |
    |tramp3d-v4      |189572      |189292      |-0.1477%     |
    
    This is part of an enhanced version of machine outliner -- see
    [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
    xuanzhang816 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    58c7def View commit details
    Browse the repository at this point in the history
  228. [memprof] Clean up IndexedMemProfReader (NFC) (llvm#94710)

    Parameter "Version" is confusing in deserializeV012 and deserializeV3
    because we also have member variable "Version".  Fortunately,
    parameter "Version" and member variable "Version" always have the same
    value because IndexedMemProfReader::deserialize initializes the member
    variable and passes it to deserializeV012 and deserializeV3.
    
    This patch removes the parameter.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    dc9c2df View commit details
    Browse the repository at this point in the history
  229. Configuration menu
    Copy the full SHA
    8d913d5 View commit details
    Browse the repository at this point in the history
  230. [memprof] Use CallStackRadixTreeBuilder in the V3 format (llvm#94708)

    This patch integrates CallStackRadixTreeBuilder into the V3 format,
    reducing the profile size to about 27% of the V2 profile size.
    
    - Serialization: writeMemProfCallStackArray just needs to write out
      the radix tree array prepared by CallStackRadixTreeBuilder.
      Mappings from CallStackIds to LinearCallStackIds are moved by new
      function CallStackRadixTreeBuilder::takeCallStackPos.
    
    - Deserialization: Deserializing a call stack is the same as
      deserializing an array encoded in the obvious manner -- the length
      followed by the payload, except that we need to follow a pointer to
      the parent to take advantage of common prefixes once in a while.
      This patch teaches LinearCallStackIdConverter to how to handle those
      pointers.
    kazutakahirata authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    637baa5 View commit details
    Browse the repository at this point in the history
  231. [mlir][vector] Remove Emulated Sub-directory (llvm#94742)

    The "Emulated" sub-directories under "ArmSVE" and
    "ArmSME" have been removed. Associated tests
    have been moved up a directory and now include
    the "REQUIRES" constraint for the arm-emulator.
    mub-at-arm authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ad12734 View commit details
    Browse the repository at this point in the history
  232. [gn] port 33a6ce1 (check-clang obj2yaml dep)

    nico authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7f5aeb1 View commit details
    Browse the repository at this point in the history
  233. [gn] port cb7690a (ntdll dep)

    nico authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    aae32f6 View commit details
    Browse the repository at this point in the history
  234. [KnownBits] Remove hasConflict() assertions (llvm#94568)

    Allow KnownBits to represent "always poison" values via conflict.
    
    close: llvm#94436
    c8ef authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    5b14f6d View commit details
    Browse the repository at this point in the history
  235. [libc++][test][AIX] Only XFAIL atomic tests for before clang 19 (llvm…

    …#94646)
    
    These tests pass on 64-bit. They were fixed by 5fdd094 on 32-bit.
    So XFAIL only for 32-bit before clang 19.
    jakeegan authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    1508a3d View commit details
    Browse the repository at this point in the history
  236. [AArch64] Add patterns for add(uzp1(x,y), uzp2(x, y)) -> addp.

    If we are extracting the even lanes and the odd lanes and adding them, we can
    use an addp instruction.
    davemgreen authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    53615ae View commit details
    Browse the repository at this point in the history
  237. Configuration menu
    Copy the full SHA
    3a93ccc View commit details
    Browse the repository at this point in the history
  238. [libc++][regex] Correctly adjust match prefix for zero-length matches. (

    llvm#94550)
    
    For regex patterns that produce zero-length matches, there is one
    (imaginary) match in-between every character in the sequence being
    searched (as well as before the first character and after the last
    character). It's easiest to demonstrate using replacement:
    `std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where
    each exclamation mark makes a zero-length match visible.
    
    Currently our implementation doesn't correctly set the prefix of each
    zero-length match, "swallowing" the characters separating the imaginary
    matches -- e.g. when going through zero-length matches within `abc`, the
    corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this
    patch they will all be empty (`{'', '', '', ''}`). This happens in the
    implementation of `regex_iterator::operator++`. Note that the Standard
    spells out quite explicitly that the prefix might need to be adjusted
    when dealing with zero-length matches in
    [`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr):
    > In all cases in which the call to `regex_search` returns `true`,
    `match.prefix().first` shall be equal to the previous value of
    `match[0].second`... It is unspecified how the implementation makes
    these adjustments.
    
    [Reproduction example](https://godbolt.org/z/8ve6G3dav)
    ```cpp
    #include <iostream>
    #include <regex>
    #include <string>
    
    int main() {
      std::string str = "abc";
      std::regex empty_matching_pattern("");
    
      { // The underlying problem is that `regex_iterator::operator++` doesn't update
        // the prefix correctly.
        std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e;
        std::cout << "\"";
        for (; i != e; ++i) {
          const std::ssub_match& prefix = i->prefix();
          std::cout << prefix.str();
        }
        std::cout << "\"\n";
        // Before the patch: ""
        // After the patch: "abc"
      }
    
      { // `regex_replace` makes the problem very visible.
        std::string replaced = std::regex_replace(str, empty_matching_pattern, "!");
        std::cout << "\"" << replaced << "\"\n";
        // Before the patch: "!!!!"
        // After the patch: "!a!b!c!"
      }
    }
    ```
    
    Fixes llvm#64451
    
    rdar://119912002
    var-const authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    aaa160e View commit details
    Browse the repository at this point in the history
  239. Reapply PR/87550 (llvm#94625)

    Re-apply llvm#87550 with fixes.
    
    Details:
    Some tests in fuchsia failed because of the newly added assertion.
    This was because `GetExceptionBreakpoint()` could be called before
    `g_dap.debugger` was initted.
    
    The fix here is to just lazily populate the list in
    GetExceptionBreakpoint() rather than assuming it's already been initted.
    (There is some nuisance here because we can't simply just populate it in
    DAP::DAP(), which is a global ctor and is called before
    `SBDebugger::Initialize()` is called. )
    oontvoo authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    8bb019d View commit details
    Browse the repository at this point in the history
  240. [libc++] Undeprecate shared_ptr atomic access APIs (llvm#92920)

    This patch reverts 9b832b7 (llvm#87111):
    - [libc++] Deprecated `shared_ptr` Atomic Access APIs as per P0718R2
    - [libc++] Implemented P2869R3: Remove Deprecated `shared_ptr` Atomic Access APIs from C++26
    
    As explained in [1], the suggested replacement in P2869R3 is `__cpp_lib_atomic_shared_ptr`,
    which libc++ does not yet implement. Let's not deprecate the old way of doing things before
    the new way of doing things exists.
    
    [1]: llvm#87111 (comment)
    nico authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    7e2707f View commit details
    Browse the repository at this point in the history
  241. [Reassociate] shifttest.ll - generate test checks to replace custom g…

    …rep expression
    
    (and remove an unused argument)
    RKSimon authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    fd08cef View commit details
    Browse the repository at this point in the history
  242. [flang][runtime] add SHAPE runtime interface (llvm#94702)

    Add SHAPE runtime API (will be used for assumed-rank, lowering is
    generating other cases inline).
    
    I tried to make it in a way were there is no dynamic allocation in the
    runtime/deallocation expected to be inserted by inline code for arrays
    that we know are small (lowering will just always stack allocate a rank
    15 array to avoid dynamic stack allocation or heap allocation).
    jeanPerier authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3aa3f3a View commit details
    Browse the repository at this point in the history
  243. Configuration menu
    Copy the full SHA
    8a0529a View commit details
    Browse the repository at this point in the history
  244. [OpenMP] Fix passing target id features to AMDGPU offloading (llvm#94765

    )
    
    Summary:
    AMDGPU supports a `target-id` feature which is used to qualify targets
    with different incompatible features. These are both rules and target
    features. Currently, we pass `-target-cpu` twice when offloading to
    OpenMP, and do not pass the target-id features at all. The effect was
    that passing something like `--offload-arch=gfx90a:xnack+` would show up
    as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
    xnack completely and passing it twice. This patch fixes that to pass it
    once and then separate it like how HIP does.
    jhuber6 authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    e854b11 View commit details
    Browse the repository at this point in the history
  245. Fixed grammatical error in "enum specifier" error msg llvm#94443 (llv…

    …m#94592)
    
    As discussed in llvm#94443, this PR changes the wording to be more correct.
    kper authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    624a743 View commit details
    Browse the repository at this point in the history
  246. Configuration menu
    Copy the full SHA
    f7d4ecb View commit details
    Browse the repository at this point in the history
  247. Check if LLD is built when checking if lto_supported (llvm#92752)

    Otherwise, older copies of LLD may not understand the latest bitcode
    versions (for example, if we increase
    `ModuleSummaryIndex::BitCodeSummaryVersion`)
    
    Related to
    llvm#90692 (comment)
    jvoung authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    d931adf View commit details
    Browse the repository at this point in the history
  248. [mlir][vector][NFC] Make function name more meaningful in lit tests. (l…

    …lvm#94538)
    
    It also moves the test near other similar test cases.
    hanhanW authored and srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    ae23164 View commit details
    Browse the repository at this point in the history
  249. update

    srcarroll committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    b928554 View commit details
    Browse the repository at this point in the history
  250. Configuration menu
    Copy the full SHA
    df0747c View commit details
    Browse the repository at this point in the history