-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement grouped conv interface #80870
Implement grouped conv interface #80870
Commits on Jun 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for abd0bf1 - Browse repository at this point
Copy the full SHA abd0bf1View commit details -
This reverts commit 2ec122d.
Configuration menu - View commit details
-
Copy full SHA for bf572a5 - Browse repository at this point
Copy the full SHA bf572a5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 28e57cd - Browse repository at this point
Copy the full SHA 28e57cdView commit details -
[RuntimeDyld][ELF] Fix unwanted sign extension. (llvm#94482)
Casting the result of `Section.getAddressWithOffset()` goes wrong if we are on a 32-bit platform whose addresses are regarded as signed; in that case, just doing ``` (uint64_t)Section.getAddressWithOffset(...) ``` or ``` reinterpret_cast<uint64_t>(Section.getAddressWithOffset(...)) ``` will result in sign-extension. We use these expressions when constructing branch stubs, which is before we know the final load address, so we can just switch to the `Section.getLoadAddressWithOffset(...)` method instead. Doing that is also more consistent, since when calculating relative offsets for relocations, we use the load address anyway, so the code currently only works because `Section.Address` is equal to `Section.LoadAddress` at this point. Fixes llvm#94478.
Configuration menu - View commit details
-
Copy full SHA for ea63530 - Browse repository at this point
Copy the full SHA ea63530View commit details -
[LoongArch] Add a hook to sign extend i32 ConstantInt operands of phi…
…s on LA64 (llvm#93813) Materializing constants on LoongArch is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow LoongArch to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code.
Configuration menu - View commit details
-
Copy full SHA for 9fc29d3 - Browse repository at this point
Copy the full SHA 9fc29d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ddc841 - Browse repository at this point
Copy the full SHA 5ddc841View commit details -
[libc][math][c23] Implement fmaxf16 and fminf16 function (llvm#94131)
Implements fmaxf16 and fminf16, which are two missing functions listed here: llvm#93566
Configuration menu - View commit details
-
Copy full SHA for 95e1431 - Browse repository at this point
Copy the full SHA 95e1431View commit details -
[lldb] Fix inconsistencies in DWARFExpression errors (llvm#94554)
This patch make all errors start with a lowercase letter and removes trailing periods and newlines. This fixes inconsistencies between error messages and facilitate concatenating them.
Configuration menu - View commit details
-
Copy full SHA for 9e25be5 - Browse repository at this point
Copy the full SHA 9e25be5View commit details -
[libc][math][c23] Add {nextafter,nexttoward,nextup,nextdown}f16 C23 m…
…ath functions (llvm#94535) llvm#93566
Configuration menu - View commit details
-
Copy full SHA for f2165ae - Browse repository at this point
Copy the full SHA f2165aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for b22873d - Browse repository at this point
Copy the full SHA b22873dView commit details -
[lldb/crashlog] Always load Application Specific Backtrace Thread ima…
…ges (llvm#94259) This patch changes the crashlog image loading default behaviour to not only load images from the crashed thread but also for the application specific backtrace thread. This patch also move the Application Specific Backtrace / Last Exception Backtrace tag from the thread queue field to the thread name. rdar://128276576 Signed-off-by: Med Ismail Bennani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6a4e4f6 - Browse repository at this point
Copy the full SHA 6a4e4f6View commit details -
[WebAssembly] Set IS_64 flag correctly on __indirect_function_table i…
…n object files (llvm#94487) Follow up to llvm#92042
Configuration menu - View commit details
-
Copy full SHA for 725b792 - Browse repository at this point
Copy the full SHA 725b792View commit details -
[serialization] no transitive decl change (llvm#92083)
Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.
Configuration menu - View commit details
-
Copy full SHA for 07c0c26 - Browse repository at this point
Copy the full SHA 07c0c26View commit details -
Revert "[RISCV] Support select/merge like ops for bf16 vectors when h…
…ave Zvfbfmin" (llvm#94565) Reverts llvm#91936 Premerge bots are broken.
Configuration menu - View commit details
-
Copy full SHA for c719881 - Browse repository at this point
Copy the full SHA c719881View commit details -
[flang] Add GETCWD runtime and lowering intrinsics implementation (ll…
…vm#92746) This patch add support of intrinsics GNU extension GETCWD llvm#84203. Some usage info and example has been added to `flang/docs/Intrinsics.md`. The patch contains both the lowering and the runtime code and works on both Windows and Linux. | System | Implmentation | |-----------|--------------------| | Windows | _getcwd | | Linux |getcwd |
Configuration menu - View commit details
-
Copy full SHA for 3052bcc - Browse repository at this point
Copy the full SHA 3052bccView commit details -
[clang] Implement a __is_bitwise_cloneable builtin type trait. (llvm#…
…86512) This patch implements a `__is_bitwise_cloneable` builtin in clang. The builtin is used as a guard to check a type can be safely bitwise copied by memcpy. It's functionally similar to `__is_trivially_copyable`, but covers a wider range of types (e.g. classes with virtual functions). The compiler guarantees that after copy, the destination object has the same object representations as the source object. And it is up to user to guarantee that program semantic constraints are satisfied. Context: https://discourse.llvm.org/t/extension-for-creating-objects-via-memcpy
Configuration menu - View commit details
-
Copy full SHA for 85fd90b - Browse repository at this point
Copy the full SHA 85fd90bView commit details -
[LoongArch] Adjust LA64 data layout by using n32:64 in layout string (l…
…lvm#93814) Although i32 type is illegal in the backend, LA64 has pretty good support for i32 types by using W instructions. By adding n32 to the DataLayout string, middle end optimizations will consider i32 to be a native type. One known effect of this is enabling LoopStrengthReduce on loops with i32 induction variables. This can be beneficial because C/C++ code often has loops with i32 induction variables due to the use of `int` or `unsigned int`. If this patch exposes performance issues, those are better addressed by tuning LSR or other passes.
Configuration menu - View commit details
-
Copy full SHA for 16c3e1a - Browse repository at this point
Copy the full SHA 16c3e1aView commit details -
[MLIR][LLVM] Improve module translation comment (NFC) (llvm#94577)
This commit enhances the docsting of `translateModuleToLLVMIR` as a followup to llvm#94445
Configuration menu - View commit details
-
Copy full SHA for 8ffa33f - Browse repository at this point
Copy the full SHA 8ffa33fView commit details -
Configuration menu - View commit details
-
Copy full SHA for b6c4da3 - Browse repository at this point
Copy the full SHA b6c4da3View commit details -
Configuration menu - View commit details
-
Copy full SHA for ab331bb - Browse repository at this point
Copy the full SHA ab331bbView commit details -
[InstCombine] Only requite not-undef in select equiv fold
As the comment already indicates, only replacement with undef is problematic, as it introduces an additional use of undef. Use the correct ValueTracking helper.
Configuration menu - View commit details
-
Copy full SHA for d836ae8 - Browse repository at this point
Copy the full SHA d836ae8View commit details -
[ValueTracking] Make undef element check more precise
If we're only checking for undef, then also only look for undef elements in the vector (rather than undef and poison).
Configuration menu - View commit details
-
Copy full SHA for 942e935 - Browse repository at this point
Copy the full SHA 942e935View commit details -
[LoopUnroll] Consider convergence control tokens when unrolling (llvm…
…#91715) - There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <[email protected]>.
Configuration menu - View commit details
-
Copy full SHA for fa7e78c - Browse repository at this point
Copy the full SHA fa7e78cView commit details -
[SDPatternMatch] Do not use std::forward and rvalue references (NFC) (l…
…lvm#93806) The m_ZExtOrSelf() family of matchers currently incorrectly calls std::forward twice on the same value. However, just removing those causes other complications, because then template arguments get incorrectly inferred to const references instead of the underlying value types. Things become a mess. Instead, just completely remove the use of std::forward and rvalue references from SDPatternMatch. I don't think they really provide value in this context, especially as they're not used consistently in the first place.
Configuration menu - View commit details
-
Copy full SHA for 11675cb - Browse repository at this point
Copy the full SHA 11675cbView commit details -
[InstCombine] Add transforms
(icmp spred (and X, Y), X)
ifX
or `……Y` are known signed/unsigned Several transforms: 1) If known `Y < 0`: - slt -> ult: https://alive2.llvm.org/ce/z/9zt2iK - sle -> ule: https://alive2.llvm.org/ce/z/SPoPNF - sgt -> ugt: https://alive2.llvm.org/ce/z/IGNxAk - sge -> uge: https://alive2.llvm.org/ce/z/joqTvR 2) If known `Y >= 0`: - `(X & PosY) s> X --> X s< 0` - https://alive2.llvm.org/ce/z/7e-5BQ - `(X & PosY) s> X --> X s< 0` - https://alive2.llvm.org/ce/z/jvT4Gb 3) If known `X < 0`: - `(NegX & Y) s> NegX --> Y s>= 0` - https://alive2.llvm.org/ce/z/ApkaEh - `(NegX & Y) s<= NegX --> Y s< 0` - https://alive2.llvm.org/ce/z/oRnfHp Closes llvm#94417
Configuration menu - View commit details
-
Copy full SHA for ae8398c - Browse repository at this point
Copy the full SHA ae8398cView commit details -
[ARM] vabd.ll - regenerate test checks
Cleanup for llvm#94504
Configuration menu - View commit details
-
Copy full SHA for 2b0061c - Browse repository at this point
Copy the full SHA 2b0061cView commit details -
[ARM] vaba.ll - regenerate test checks
Cleanup for llvm#94504
Configuration menu - View commit details
-
Copy full SHA for 43a52d5 - Browse repository at this point
Copy the full SHA 43a52d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 39027b5 - Browse repository at this point
Copy the full SHA 39027b5View commit details -
[DebugInfo][SelectionDAG] Fix position of salvaged 'dangling' DBG_VAL…
…UEs (llvm#94458) `SelectionDAGBuilder::handleDebugValue` has a parameter `Order` which represents the insert-at position for the new DBG_VALUE. Prior to this patch `SelectionDAGBuilder::SDNodeOrder` is used instead of the `Order` parameter. The only code-paths where `Order != SDNodeOrder` are the two calls calls to `handleDebugValue` from `salvageUnresolvedDbgValue`. `salvageUnresolvedDbgValue` is called from `resolveOrClearDbgInfo` and `dropDanglingDebugInfo`. The former is called after SelectionDAG completes one block. Some dbg.values can't be lowered to DBG_VALUEs right away. These get recorded as 'dangling' - their order-number is saved - and get salvaged later through `dropDanglingDebugInfo`, or if we've still got dangling debug info once the whole block has been emitted, through `resolveOrClearDbgInfo`. Their saved order-number is passed to `handleDebugValue`. Prior to this patch, DBG_VALUEs inserted using these functions are inserted at the "current" `SDNodeOrder` rather than the intended position that is passed to the function. Fix and add test.
Configuration menu - View commit details
-
Copy full SHA for 5ed1246 - Browse repository at this point
Copy the full SHA 5ed1246View commit details -
Configuration menu - View commit details
-
Copy full SHA for 94cc5f4 - Browse repository at this point
Copy the full SHA 94cc5f4View commit details -
[llvm-reduce] Remove DIGlobalVariableExpressions from DICompileUnit's…
… globals (llvm#94497) The 'metadata' delta pass will remove !dbg attachments from globals (which are DIGlobalVariableExpression nodes). The DIGlobalVariableExpressions don't get eliminated from the IR however if they are still referenced by the globals field in DICompileUnit. Teach the 'di-metadata' pass to try removing global variable operands from metadata tuples as well as DINodes.
Configuration menu - View commit details
-
Copy full SHA for 15b6e55 - Browse repository at this point
Copy the full SHA 15b6e55View commit details -
[flang][OpenMP] Fix privatization when critical is present (llvm#94441)
When a critical construct is present inside another construct where privatizations may occur, such as a parallel construct, some privatizations are skipped if the corresponding symbols are defined inside the critical section only (see the example below). This happens because, while critical constructs have a "body", they don't have a separate scope (which makes sense, since no privatizations can occur in them). Because of this, in semantics phase, it's not possible to insert a new host association symbol, but instead the symbol from the enclosing context is used directly. This makes symbol collection in DataSharingProcessor consider the new symbol to be defined by the critical construct, instead of by the enclosing one, which causes the privatization to be skipped. Example: ``` !$omp parallel default(firstprivate) !$omp critical i = 200 !$omp end critical !$omp end parallel ``` This patch fixes this by identifying constructs where privatizations may not happen and skipping them during the collection of nested symbols. Currently, this seems to happen only with critical constructs, but others can be easily added to the skip list, if needed. Fixes llvm#75767
Configuration menu - View commit details
-
Copy full SHA for 0f88be8 - Browse repository at this point
Copy the full SHA 0f88be8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 16f7316 - Browse repository at this point
Copy the full SHA 16f7316View commit details -
[PowerPC] Add test to show alignment of toc-data symbol is changed. NFC.
After O3 opt pipeline, the alignment of toc-data symbol is changed which is unexpected.
Configuration menu - View commit details
-
Copy full SHA for 376f0d5 - Browse repository at this point
Copy the full SHA 376f0d5View commit details -
[lldb] Disable TestPtyServer API test when remote testing (llvm#94587)
The local PTY is not available for the remotely executed lldb-server to pass the test. Also, in general, we cannot execute the local lldb-server instance because it could be compiled for the different system/cpu target.
Configuration menu - View commit details
-
Copy full SHA for 65a7389 - Browse repository at this point
Copy the full SHA 65a7389View commit details -
[ARM] Add neon_vabd.ll based off aarch64 tests
Test coverage for llvm#94504
Configuration menu - View commit details
-
Copy full SHA for 5791862 - Browse repository at this point
Copy the full SHA 5791862View commit details -
[DAG] visitSUB - update the ABS matching code to use SDPatternMatch a…
…nd hasOperation. Avoids the need to explicitly test both commuted variants and doesn't match custom lowering after legalization. Cleanup for llvm#94504
Configuration menu - View commit details
-
Copy full SHA for 80694e4 - Browse repository at this point
Copy the full SHA 80694e4View commit details -
[flang][CodeGen][NFC] Reduce boilerplatre for ExternalNameConversion (l…
…lvm#94474) Use tablegen to generate the pass constructor. I removed the duplicated pass option handling. I don't understand why the manual instantiation of the pass needs its own duplicate of the pass options in the (automatically generated) base class (even with the option to ignore the pass options in the base class). This pass doesn't need changes to support other top level operations.
Configuration menu - View commit details
-
Copy full SHA for 40333fc - Browse repository at this point
Copy the full SHA 40333fcView commit details -
[PowerPC] Adjust operand order of ADDItoc to be consistent with other…
… ADDI* nodes (llvm#93642) Simultaneously, the `ADDItoc` machineinstr is generated in `PPCISelDAGToDAG::Select` so the pattern is not used and can be removed.
Configuration menu - View commit details
-
Copy full SHA for c3bc314 - Browse repository at this point
Copy the full SHA c3bc314View commit details -
[clang][Interp] Member Pointers (llvm#91303)
This adds a `MemberPointer` class along with a `PT_MemberPtr` primitive type. A `MemberPointer` has a `Pointer` Base as well as a `Decl*` (could be `ValueDecl*`?) decl it points to. For the actual logic, this mainly changes the way we handle `PtrMemOp`s in `VisitBinaryOperator`.
Configuration menu - View commit details
-
Copy full SHA for 75a1c58 - Browse repository at this point
Copy the full SHA 75a1c58View commit details -
Configuration menu - View commit details
-
Copy full SHA for 453205a - Browse repository at this point
Copy the full SHA 453205aView commit details -
[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQu…
…adMode (llvm#94452) NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT instructions a little longer so that we can reliably identify them in SIWholeQuadMode.
Configuration menu - View commit details
-
Copy full SHA for 1cb3b5c - Browse repository at this point
Copy the full SHA 1cb3b5cView commit details -
[AMDGPU] Implement variadic functions by IR lowering (llvm#93362)
This is a mostly-target-independent variadic function optimisation and lowering pass. It is only enabled for AMDGPU in this initial commit. The purpose is to make C style variadic functions a zero cost abstraction. They are lowered to equivalent IR which is then amenable to other optimisations. This is inherently slightly target specific but much less so than one might expect - the C varargs interface heavily constrains the ABI design divergence. The pass is primarily tested from webassembly. This is because wasm has a straightforward variadic lowering strategy which coincides exactly with what this pass transforms code into and a struct passing convention with few cases to check. Adding further targets conventions is straightforward and elided from this patch primarily to simplify the review. Implemented in other branches are Linux X86, AMD64, AArch64 and NVPTX. Testing for targets that have existing lowering for va_arg from clang is most efficiently done by checking that clang | opt completely elides the variadic syntax from test cases. The lowering produces a struct for each call site which can be inspected to check the various alignment and indirections are correct. AMDGPU presently has no variadic support other than some ad hoc printf handling. Combined with the pass being inactive on all other targets landing this represents strict increase in capability with zero risk. Testing and refining will continue post commit. In addition to the compiler tests included here, a self contained x64 clang/musl toolchain was constructed using the "lowering" instead of the systemv ABI and used to build various C programs like lua and libxml2.
Configuration menu - View commit details
-
Copy full SHA for 73af086 - Browse repository at this point
Copy the full SHA 73af086View commit details -
Revert "[Analyzer][CFG] Correctly handle rebuilt default arg and defa…
…ult init expression (llvm#91879)" (llvm#94597) This depends on llvm#92527 which needs to be reverted due to llvm#92527 (comment). This reverts commit 905b402. Co-authored-by: Bogdan Graur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 85c4dd6 - Browse repository at this point
Copy the full SHA 85c4dd6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6ec798c - Browse repository at this point
Copy the full SHA 6ec798cView commit details -
Revert "[serialization] no transitive decl change (llvm#92083)"
This reverts commit 97c866f. This fails on 32bit machines. See llvm#92083
Configuration menu - View commit details
-
Copy full SHA for 8d48e5c - Browse repository at this point
Copy the full SHA 8d48e5cView commit details -
Revert "Reapply "[Clang][CWG1815] Support lifetime extension of tempo…
…rary created by aggregate initialization using a default member initializer" (llvm#92527)" (llvm#94600) Reverting due to llvm#92527 (comment). This reverts commit f049d72. Co-authored-by: Bogdan Graur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 13eb6de - Browse repository at this point
Copy the full SHA 13eb6deView commit details -
[AArch64][SME] Add calling convention for __arm_get_current_vg (llvm#…
…93963) Adds a calling convention for calls to the `__arm_get_current_vg` support routine, which preserves X1-X15, X19-X29, SP, Z0-Z31 & P0-P15. See ARM-software/abi-aa#263
Configuration menu - View commit details
-
Copy full SHA for 8a3e4b1 - Browse repository at this point
Copy the full SHA 8a3e4b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2640ff6 - Browse repository at this point
Copy the full SHA 2640ff6View commit details -
[GlobalIsel] Combine G_VSCALE (llvm#94096)
We need them for scalable address calculation and legal scalable addressing modes.
Configuration menu - View commit details
-
Copy full SHA for 6299b18 - Browse repository at this point
Copy the full SHA 6299b18View commit details -
Remove some #includes in ExpandVariadics.cpp as it will cause layering violations.
Configuration menu - View commit details
-
Copy full SHA for bfc793a - Browse repository at this point
Copy the full SHA bfc793aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b84df2 - Browse repository at this point
Copy the full SHA 1b84df2View commit details -
[Transforms] Fix -Wunused-variable in ExpandVariadics.cpp (NFC)
/llvm-project/llvm/lib/Transforms/IPO/ExpandVariadics.cpp:426:14: error: unused variable 'OriginalFunctionIsDeclaration' [-Werror,-Wunused-variable] const bool OriginalFunctionIsDeclaration = OriginalFunction->isDeclaration(); ^ /llvm-project/llvm/lib/Transforms/IPO/ExpandVariadics.cpp:445:13: error: unused variable 'VariadicWrapperDefine' [-Werror,-Wunused-variable] Function *VariadicWrapperDefine = ^ 2 errors generated.
Configuration menu - View commit details
-
Copy full SHA for aa10468 - Browse repository at this point
Copy the full SHA aa10468View commit details -
[ARM] Don't block tail-predication from unrelated VPT blocks. (llvm#9…
…4239) VPT blocks that do not produce an interesting 'output' (like a stored value or reduction result), do not need to be predicated on vctp for the whole loop to be tail-predicated. Just producing results for the valid tail predication lanes should be enough.
Configuration menu - View commit details
-
Copy full SHA for cec4763 - Browse repository at this point
Copy the full SHA cec4763View commit details -
[clang-tidy] Remove redundant LINK_LIBS (llvm#94588)
clangAnalysis is already being pulled in via clang_target_link_libraries(). Also listing it in LINK_LIBS means that we'll link both against the static libraries and the shared libclang-cpp.so library if CLANG_LINK_CLANG_DYLIB is enabled, and waste time on unnecessary LTO.
Configuration menu - View commit details
-
Copy full SHA for 0b8566b - Browse repository at this point
Copy the full SHA 0b8566bView commit details -
[libc][math] Temporarily disable nexttowardf16 on aarch64 due to clan…
…g-11 bug. (llvm#94569) The conversion between _Float16 and long double will crash clang-11 on aarch64. This is fixed in clang-12: https://godbolt.org/z/8ceT9454c
Configuration menu - View commit details
-
Copy full SHA for 7b0f3ad - Browse repository at this point
Copy the full SHA 7b0f3adView commit details -
[DAG] expandABS - add missing FREEZE in abs(x) -> smax(x,sub(0,x)) ex…
…pansion Noticed while working on llvm#94601
Configuration menu - View commit details
-
Copy full SHA for cdc9c01 - Browse repository at this point
Copy the full SHA cdc9c01View commit details -
[flang][OpenMP] Make object identity more precise (llvm#94495)
Derived type components may use a given `Symbol` regardless of what parent objects they are a part of. Because of that, simply using a symbol address is not sufficient to determine object identity. Make the designator a part of the IdTy. To compare identities, when symbols are equal (and non-null), compare the designators.
Configuration menu - View commit details
-
Copy full SHA for 5fd7a9b - Browse repository at this point
Copy the full SHA 5fd7a9bView commit details -
[clang][Sema] Add missing scope flags to Scope::dumpImpl (llvm#94529)
There were a handlful of scope flags that were not handled in the dump function, which would then lead to an assert.
Configuration menu - View commit details
-
Copy full SHA for ffebcd0 - Browse repository at this point
Copy the full SHA ffebcd0View commit details -
[ConstraintElim] Add set of tests where a loop iv is used in exit.
Test cases inspired by llvm#90417.
Configuration menu - View commit details
-
Copy full SHA for 7a4c101 - Browse repository at this point
Copy the full SHA 7a4c101View commit details -
[LoongArch] Allow f16 codegen with expansion to libcalls (llvm#94456)
The test case is adapted from llvm/test/CodeGen/RISCV/fp16-promote.ll, because it covers some more IR patterns that ought to be common. Fixes llvm#93894
Configuration menu - View commit details
-
Copy full SHA for 7a2a155 - Browse repository at this point
Copy the full SHA 7a2a155View commit details -
[workflows] Add scan-build to ci-ubuntu-22.04 container (llvm#94543)
This will be used for a new CI job that runs the static analyzer.
Configuration menu - View commit details
-
Copy full SHA for 8aebc05 - Browse repository at this point
Copy the full SHA 8aebc05View commit details -
[X86][AMX] Checking AMXProgModel in X86LowerTileCopy (llvm#94358)
This fixes compile time regression after llvm#93692.
Configuration menu - View commit details
-
Copy full SHA for e60c668 - Browse repository at this point
Copy the full SHA e60c668View commit details -
[Libomptarget] Rework device initialization and image registration (l…
…lvm#93844) Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with *all* devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.
Configuration menu - View commit details
-
Copy full SHA for c33b869 - Browse repository at this point
Copy the full SHA c33b869View commit details -
[X86] Fix pipe resources for HADD/SUB instructions
IceLakeServer was copying these from SkylakeServer, but integer HADD/SUB can now run on an extra port
Configuration menu - View commit details
-
Copy full SHA for ba9c7b4 - Browse repository at this point
Copy the full SHA ba9c7b4View commit details -
[X86] Fix pipe resources for FP HADD/SUB instructions
IceLakeServer/SkylakeServer can only use Port01 for the FADD/FSUB stage Confirmed with uops.info + Agner
Configuration menu - View commit details
-
Copy full SHA for 1dce655 - Browse repository at this point
Copy the full SHA 1dce655View commit details -
[Clang][AMDGPU] Use
I
to decorate imm argument for `__builtin_amdgc……n_global_load_lds` (llvm#94376)
Configuration menu - View commit details
-
Copy full SHA for 093818b - Browse repository at this point
Copy the full SHA 093818bView commit details -
[libc] Enable varargs tests for AMDGPU targets
Summary: This reverts commit 574ab7e.
Configuration menu - View commit details
-
Copy full SHA for 4782ad7 - Browse repository at this point
Copy the full SHA 4782ad7View commit details -
[NVPTX] Revamp NVVMIntrRange pass (llvm#94422)
Revamp the NVVMIntrRange pass making the following updates: - Use range attributes over range metadata. This is what instcombine has move to for ranges on intrinsics in llvm#88776 and it seems a bit cleaner. - Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function metadata when adding ranges for `tid` srge instrinsics. This can allow for smaller ranges and more optimization. - When range attributes are already present, use the intersection of the old and new range. This complements the metadata change by allowing ranges to be shrunk when an intrinsic is in a function which is inlined into a kernel with metadata. While we don't call this more then once yet, we should consider adding a second call after inlining, once this has had a chance to soak for a while and no issues have arisen. I've also re-enabled this pass in the TM, it was disabled years ago due to "numerical discrepancies" https://reviews.llvm.org/D96166. In our testing we haven't seen any issues with adding ranges to intrinsics, and I cannot find any further info about what issues were encountered.
Configuration menu - View commit details
-
Copy full SHA for 2be0989 - Browse repository at this point
Copy the full SHA 2be0989View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8fe8ede - Browse repository at this point
Copy the full SHA 8fe8edeView commit details -
[AArch64] Override isLSRCostLess, take number of instructions into ac…
…count (llvm#84189) Adds an AArch64-specific version of isLSRCostLess, changing the relative importance of the various terms from the formulae being evaluated. This has been split out from my vscale-aware LSR work, see the RFC for reference: https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131
Configuration menu - View commit details
-
Copy full SHA for 48c9a27 - Browse repository at this point
Copy the full SHA 48c9a27View commit details -
Configuration menu - View commit details
-
Copy full SHA for b4896c9 - Browse repository at this point
Copy the full SHA b4896c9View commit details -
[NVPTX] Remove unused private field in NVVMIntrRange.cpp (NFC)
/llvm-project/llvm/lib/Target/NVPTX/NVVMIntrRange.cpp:33:12: error: private field 'SmVersion' is not used [-Werror,-Wunused-private-field] unsigned SmVersion; ^ 1 error generated.
Configuration menu - View commit details
-
Copy full SHA for 2450e72 - Browse repository at this point
Copy the full SHA 2450e72View commit details -
[lldb] Fix ThreadPlanStepOverRange name in log message (llvm#94611)
Co-authored-by: Marianne Mailhot-Sarrasin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 52edac7 - Browse repository at this point
Copy the full SHA 52edac7View commit details -
[OpenMP][NFC] Fix warning for OpenMP standalone build (llvm#93463)
PR llvm#75125 introduced upward propagation of some OMPT-related CMake variables. For stand-alone builds this results in a warning that `SCOPE_PARENT` has no meaning in a top-level directory.
Configuration menu - View commit details
-
Copy full SHA for a6444dd - Browse repository at this point
Copy the full SHA a6444ddView commit details -
[RISCV] Fix duplicate test cases for G_UNMERGE_VALUES (llvm#94622)
`unmerge_i64` and `unmerge_i32` were exactly the same test cases. This PR would fix that, so `unmerge_i32` would actually unmerge a 32 bit value into two 16 bit values.
Configuration menu - View commit details
-
Copy full SHA for 5ddd8e7 - Browse repository at this point
Copy the full SHA 5ddd8e7View commit details -
[mlir][tensor] Implement constant folder for tensor.pad (llvm#92691)
Extend the folding ability of the RewriteAsConstant patterns to include tensor.pad operations on constants. The new pattern with constant fold tensor.pad operations which operate on tensor constants and have statically resolvable padding sizes/values. %init = arith.constant dense<[[6, 7], [8, 9]]> : tensor<2x2xi32> %pad_value = arith.constant 0 : i32 %0 = tensor.pad %init low[1, 1] high[1, 1] { ^bb0(%arg1: index, %arg2: index): tensor.yield %pad_value : i32 } : tensor<2x2xi32> to tensor<4x4xi32> becomes %cst = arith.constant dense<[[0, 0, 0, 0], [0, 6, 7, 0], [0, 8, 9, 0], [0, 0, 0, 0]]> : tensor<4x4xi32> Co-authored-by: Spenser Bauman <sabauma@fastmail>
Configuration menu - View commit details
-
Copy full SHA for 0b27a2e - Browse repository at this point
Copy the full SHA 0b27a2eView commit details -
[BOLT][DWARF][NFC] Refactor GDB Index into a new file (llvm#94405)
Create a new class and file for functions that update GDB index.
Configuration menu - View commit details
-
Copy full SHA for 27a3150 - Browse repository at this point
Copy the full SHA 27a3150View commit details -
[AArch64] Add support for Qualcomm Oryon processor (llvm#91022)
Oryon is an ARM V8 AArch64 CPU from Qualcomm. --------- Co-authored-by: Wei Zhao <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 158494e - Browse repository at this point
Copy the full SHA 158494eView commit details -
[SPIR-V] Add validation to the test case with get_image_array_size/ge…
…t_image_dim calls (llvm#94467) This PR is to add validation to the test case with get_image_array_size/get_image_dim calls (transcoding/check_ro_qualifier.ll). This test case didn't pass validation because of invalid emission of OpCompositeExtract instruction (Result Type must be the same type as Composite.). In order to fix the problem this PR improves type inference in general and partially addresses issues: * llvm#91998 * llvm#91997 A reproducer from the description of the latter issue is added as a new test case as a part of this PR.
Configuration menu - View commit details
-
Copy full SHA for 79653ce - Browse repository at this point
Copy the full SHA 79653ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 25f2565 - Browse repository at this point
Copy the full SHA 25f2565View commit details -
[clang][Interp][NFC] Return a valid SourceInfo for Function PCs
We already assert that the given PC is in range and that the function has a body, so the SrcMap should generally never be empty. However, when generating destructors, we create quite a few instructions for which we have no source information, which may cause the previous assertion to fail. Return the end of the source map in this case.
Configuration menu - View commit details
-
Copy full SHA for 0e92e4f - Browse repository at this point
Copy the full SHA 0e92e4fView commit details -
DAG: Pass flags to FoldConstantArithmetic (llvm#93663)
There is simply way too much going on inside getNode. The complicated constant folding of vector handling works by looking for build_vector operands, and then tries to getNode the scalar element and then checks if constants were the result. As a side effect, this produces unused scalar operation nodes (previously, without flags). If the vector operation were later scalarized, it would find the flagless constant folding temporary and lose the flag. I don't think this is a reasonable way for constant folding to operate, but for now fix this by ensuring flags on the original operation are preserved in the temporary. This yields a clear code improvement for AMDGPU when f16 isn't legal. The Wasm cases switch from using a libcall to compare and select. We are evidently missing the fcmp+select to fminimum/fmaximum handling, but this would be further improved when that's handled. AArch64 also avoids the libcall, but looks worse and has a different call for some reason.
Configuration menu - View commit details
-
Copy full SHA for 5332122 - Browse repository at this point
Copy the full SHA 5332122View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9881528 - Browse repository at this point
Copy the full SHA 9881528View commit details -
[MLIR] Fix generic assembly syntax for ArrayAttr containing hex float (…
…llvm#94583) When a float attribute is printed with Hex, we should not elide the type because it is parsed back as i64 otherwise.
Configuration menu - View commit details
-
Copy full SHA for 4691b20 - Browse repository at this point
Copy the full SHA 4691b20View commit details -
[mlir] Add pack/unpack transpose foldings for linalg.generic ops, fix…
… bugs (llvm#93055) This PR adds transpose + pack/unpack folding support for transpose ops in the form of `linalg.generic` ops. There were also some bugs with the permutation composing in the previous patterns, so this PR fixes these bugs and adds tests for them as well.
Configuration menu - View commit details
-
Copy full SHA for b70d25d - Browse repository at this point
Copy the full SHA b70d25dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a9cf149 - Browse repository at this point
Copy the full SHA a9cf149View commit details -
[Bazel] Generate LLVM_HAS_XYZ_TARGET macros in llvm config (llvm#94476)
Otherwise code that depends on those targets being enabled might not get compiled correctly even if the targets are explicitly included in the configuration (in my case NVVM target for MLIR).
Configuration menu - View commit details
-
Copy full SHA for 659318e - Browse repository at this point
Copy the full SHA 659318eView commit details -
[CodeGen] Use std::bitset for MachineFunctionProperties (llvm#94627)
The size of the properties is fixed, so no need for a BitVector. Assigning small, fixed-size bitsets is faster. It's a minor performance improvement.
Configuration menu - View commit details
-
Copy full SHA for a8ce798 - Browse repository at this point
Copy the full SHA a8ce798View commit details -
[X86] Skip AMX type lowering when AMX is not used (llvm#92910)
The pass iterates over the IR multiple times, but most code doesn't use AMX. Therefore, do a single iteration in advance to check whether a function uses AMX at all, and exit early if it doesn't. This makes the function-has-AMX path slightly more expensive, but AMX users probably care a lot less about compile time than JIT users (which tend to not use AMX). For us, it reduces the time spent in this pass from 0.62% to 0.12%. Ideally, we wouldn't even need to iterate over the function to determine that it doesn't use AMX.
Configuration menu - View commit details
-
Copy full SHA for 696a2f5 - Browse repository at this point
Copy the full SHA 696a2f5View commit details -
RegisterCoalescer: Remove unnecessary maybe_unused
2214026 didn't fix an unused variable warning correctly.
Configuration menu - View commit details
-
Copy full SHA for ac3f92a - Browse repository at this point
Copy the full SHA ac3f92aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3c1d2a0 - Browse repository at this point
Copy the full SHA 3c1d2a0View commit details -
[libc][math][c23] Add {fmaximum,fminimum}{,_mag,_mag_num,_num} C23 ma…
…th functions (llvm#94510) llvm#93566
Configuration menu - View commit details
-
Copy full SHA for 19dd633 - Browse repository at this point
Copy the full SHA 19dd633View commit details -
[llvm][ScheduleDAG] Set a fixed size for Sched::Preference (llvm#94523)
This trims off 8 bytes from llvm::SUnit: ``` --- before 2024-06-05 12:13:00 +++ after 2024-06-05 12:12:58 @@ -1,65 +1,65 @@ *** Dumping AST Record Layout 0 | class llvm::SUnit 0 | SDNode * Node 8 | MachineInstr * Instr 16 | SUnit * OrigNode 24 | const MCSchedClassDesc * SchedClass 32 | class llvm::SmallVector<class llvm::SDep, 4> Preds 32 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 32 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 32 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 32 | class llvm::SmallVectorBase<uint32_t> (base) 32 | void * BeginX 40 | unsigned int Size 44 | unsigned int Capacity 48 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 48 | char[64] InlineElts 112 | class llvm::SmallVector<class llvm::SDep, 4> Succs 112 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 112 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 112 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 112 | class llvm::SmallVectorBase<uint32_t> (base) 112 | void * BeginX 120 | unsigned int Size 124 | unsigned int Capacity 128 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 128 | char[64] InlineElts 192 | unsigned int NodeNum 196 | unsigned int NodeQueueId 200 | unsigned int NumPreds 204 | unsigned int NumSuccs 208 | unsigned int NumPredsLeft 212 | unsigned int NumSuccsLeft 216 | unsigned int WeakPredsLeft 220 | unsigned int WeakSuccsLeft 224 | unsigned short NumRegDefsLeft 226 | unsigned short Latency 228:0-0 | _Bool isVRegCycle 228:1-1 | _Bool isCall 228:2-2 | _Bool isCallOp 228:3-3 | _Bool isTwoAddress 228:4-4 | _Bool isCommutable 228:5-5 | _Bool hasPhysRegUses 228:6-6 | _Bool hasPhysRegDefs 228:7-7 | _Bool hasPhysRegClobbers 229:0-0 | _Bool isPending 229:1-1 | _Bool isAvailable 229:2-2 | _Bool isScheduled 229:3-3 | _Bool isScheduleHigh 229:4-4 | _Bool isScheduleLow 229:5-5 | _Bool isCloned 229:6-6 | _Bool isUnbuffered 229:7-7 | _Bool hasReservedResource - 232 | Sched::Preference SchedulingPref - 236:0-0 | _Bool isDepthCurrent - 236:1-1 | _Bool isHeightCurrent - 240 | unsigned int Depth - 244 | unsigned int Height - 248 | unsigned int TopReadyCycle - 252 | unsigned int BotReadyCycle - 256 | const TargetRegisterClass * CopyDstRC - 264 | const TargetRegisterClass * CopySrcRC - | [sizeof=272, dsize=272, align=8, - | nvsize=272, nvalign=8] + 230 | Sched::Preference SchedulingPref + 231:0-0 | _Bool isDepthCurrent + 231:1-1 | _Bool isHeightCurrent + 232 | unsigned int Depth + 236 | unsigned int Height + 240 | unsigned int TopReadyCycle + 244 | unsigned int BotReadyCycle + 248 | const TargetRegisterClass * CopyDstRC + 256 | const TargetRegisterClass * CopySrcRC + | [sizeof=264, dsize=264, align=8, + | nvsize=264, nvalign=8] ```
Configuration menu - View commit details
-
Copy full SHA for 7650125 - Browse repository at this point
Copy the full SHA 7650125View commit details -
[NFC][libc++][test][AIX] fix SIMD test XFAIL for clang before 19 (llv…
Configuration menu - View commit details
-
Copy full SHA for b50875b - Browse repository at this point
Copy the full SHA b50875bView commit details -
[clang] Fix handling of adding a file with the same name as an existi…
…ng dir to VFS (llvm#94461) When trying to add a file to clang's VFS via `addFile` and a directory of the same name already exists, we run into a [out-of-bound access](https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/Path.cpp#L244). The problem is that the file name is [recognised as existing path]( https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/VirtualFileSystem.cpp#L896) and thus continues to process the next part of the path which doesn't exist. This patch adds a check if we have reached the last part of the filename and return false in that case. This we reject to add a file if a directory of the same name already exists. This is in sync with [this check](https://github.com/llvm/llvm-project/blob/145815c180fc82c5a55bf568d01d98d250490a55/llvm/lib/Support/VirtualFileSystem.cpp#L903) that rejects adding a path if a file of the same name already exists.
Configuration menu - View commit details
-
Copy full SHA for a05f49a - Browse repository at this point
Copy the full SHA a05f49aView commit details -
[clang][Interp] Always decay root array pointers to the first element
This is similar to what the current interpreter does.
Configuration menu - View commit details
-
Copy full SHA for 99fcd1b - Browse repository at this point
Copy the full SHA 99fcd1bView commit details -
[AIX] use LIBPATH on AIX instead of LD_LIBRARY_PATH (llvm#94602)
LD_LIBRARY_PATH will become invalid when LIBPATH is also set on AIX. See below example on AIX: ``` $ldd a.out a.out needs: /usr/lib/libc.a(shr.o) Cannot find libtest.a /unix /usr/lib/libcrypt.a(shr.o) $./a.out Could not load program ./a.out: Dependent module libtest.a could not be loaded. Could not load module libtest.a. System error: No such file or directory $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/tmp $./a.out ; echo $? 10 $export LIBPATH=./ $./a.out ; echo $? >>>>>> Now LD_LIBRARY_PATH is not used by system loader Could not load program ./a.out: Dependent module libtest.a could not be loaded. Could not load module libtest.a. System error: No such file or directory ``` This breaks many AIX LIT cases on our downstream buildbots which sets LIBPATH. --------- Co-authored-by: Anh Tuyen Tran <[email protected]> Co-authored-by: David Tenty <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8af229c - Browse repository at this point
Copy the full SHA 8af229cView commit details -
[AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (
llvm#94188) This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept array of function attributes as argument. Helps to remove multiple attributes in one CallGraph walk.
Configuration menu - View commit details
-
Copy full SHA for b874ae3 - Browse repository at this point
Copy the full SHA b874ae3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0031c27 - Browse repository at this point
Copy the full SHA 0031c27View commit details -
Configuration menu - View commit details
-
Copy full SHA for 275a662 - Browse repository at this point
Copy the full SHA 275a662View commit details -
Revert "[lldb][DebugNames] Only skip processing of DW_AT_declarations…
… for class/union types" and two follow-up commits. The reason is the crash we've discovered when processing -gsimple-template-names binaries. I'm committing a minimal reproducer as a separate patch. This reverts the following commits: - 51dd4ea (llvm#92328) - 3d9d485 (llvm#93839) - afe6ab7 (llvm#94400)
Configuration menu - View commit details
-
Copy full SHA for 879a468 - Browse repository at this point
Copy the full SHA 879a468View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a2fbf4 - Browse repository at this point
Copy the full SHA 3a2fbf4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00a43ed - Browse repository at this point
Copy the full SHA 00a43edView commit details -
Configuration menu - View commit details
-
Copy full SHA for 815412f - Browse repository at this point
Copy the full SHA 815412fView commit details -
DAG: Improve fminimum/fmaximum vector expansion logic (llvm#93579)
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client wanted it expanded, and it can always be expanded. This logic was tied up with what the VectorLegalizer wanted. Prefer using the min/max opcodes, and unrolling if we don't have a vselect. This seems to produce better code in all the changed tests.
Configuration menu - View commit details
-
Copy full SHA for 13f5b56 - Browse repository at this point
Copy the full SHA 13f5b56View commit details -
[clang-tidy] Fix crash in readability-container-size-empty (llvm#94527)
Fixed crash caused by call to getCookedLiteral on template user defined literal. Fix base on assert in getCookedLiteral method. Closes llvm#94454
Configuration menu - View commit details
-
Copy full SHA for ce63390 - Browse repository at this point
Copy the full SHA ce63390View commit details -
Configuration menu - View commit details
-
Copy full SHA for 30693e5 - Browse repository at this point
Copy the full SHA 30693e5View commit details -
[libc] at_quick_exit function implemented (llvm#94317)
- added at_quick_exit function - used helper file exit_handler which reuses code from atexit - atexit now calls helper functions from exit_handler - test cases and dependencies are added --------- Co-authored-by: Aaryan Shukla <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8fb6945 - Browse repository at this point
Copy the full SHA 8fb6945View commit details -
Configuration menu - View commit details
-
Copy full SHA for ee48f68 - Browse repository at this point
Copy the full SHA ee48f68View commit details -
Configuration menu - View commit details
-
Copy full SHA for c3bfbfa - Browse repository at this point
Copy the full SHA c3bfbfaView commit details -
[gtest] Enable zos for death test support (llvm#94623)
This patch implements the following change to enable zos for death test support. google/googletest#4527
Configuration menu - View commit details
-
Copy full SHA for 7842384 - Browse repository at this point
Copy the full SHA 7842384View commit details -
[clang][Interp] Diagnose functions without body like undefined ones
We only get a "reached end of constexpr function" diagnostic otherwise.
Configuration menu - View commit details
-
Copy full SHA for fd8684c - Browse repository at this point
Copy the full SHA fd8684cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e070bda - Browse repository at this point
Copy the full SHA e070bdaView commit details -
[InstCombine] Folding multiuse
(icmp eq/ne (or X, Y), Y)
for 2 uses…… of `Y` The fold will replace 2 uses of `Y` we should also do fold if `Y` has 2 uses (not only oneuse). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D159062
Configuration menu - View commit details
-
Copy full SHA for f655b95 - Browse repository at this point
Copy the full SHA f655b95View commit details -
[clang][RISCV] Update vcpop.v C interface to follow the nameing conve…
…ntion (llvm#94318) We named the intrinsics by replacing "." by "_" in the instruction conventionally, so the `vcpopv_v` where the corresponding instruction is `vcpop.v` should be named `vcpop_v`.
Configuration menu - View commit details
-
Copy full SHA for ae24158 - Browse repository at this point
Copy the full SHA ae24158View commit details -
[MemProf] Remove context id set from nodes and recompute on demand (l…
…lvm#94415) The ContextIds set on the ContextNode struct is not technically needed as we can compute it from either the callee or caller edge context ids. Remove it and add a helper to recompute from the edges on demand. Also add helpers to compute the node allocation type and whether the context ids are empty from the edges without needing to first compute the node's context id set, to minimize the runtime cost increase. This yielded a 20% reduction in peak memory for a large thin link, for about a 2% time increase (which is more than offset by some other recent time efficiency improvements).
Configuration menu - View commit details
-
Copy full SHA for 5a832da - Browse repository at this point
Copy the full SHA 5a832daView commit details -
[flang] Add reduction semantics to fir.do_loop (llvm#93934)
Derived from llvm#92480. This PR introduces reduction semantics into loops for DO CONCURRENT REDUCE. The `fir.do_loop` operation now invisibly has the `operandSegmentsizes` attribute and takes variable-length reduction operands with their operations given as `fir.reduce_attr`. For the sake of compatibility, `fir.do_loop`'s builder has additional arguments at the end. The `iter_args` operand should be placed in front of the declaration of result types, so the new operand for reduction variables (`reduce`) is put in the middle of arguments.
Configuration menu - View commit details
-
Copy full SHA for fea5d46 - Browse repository at this point
Copy the full SHA fea5d46View commit details -
[libc][FixedVector] Add more helper methods (llvm#94278)
This adds: - A ctor accepting a start and end iterator - A ctor accepting a count and const T& - size() - subscript operators - begin() and end() iterators
Configuration menu - View commit details
-
Copy full SHA for 3d79450 - Browse repository at this point
Copy the full SHA 3d79450View commit details -
[clang][NFC] fix name lookup for llvm::json::Value in SymbolGraphSeri…
…alizer (llvm#94511) This code uses namespaces `llvm` and `llvm::json`. However, we have both `llvm::Value` and `llvm::json::Value`. Whenever any of the headers declare or include `llvm::Value`, the lookup becomes ambiguous. Fixing this by qualifying the `Value` type.
Configuration menu - View commit details
-
Copy full SHA for 993247b - Browse repository at this point
Copy the full SHA 993247bView commit details -
[memprof] Use std::unique_ptr instead of std::optional (llvm#94655)
Changing the type of Frame::SymbolName from std::optional<std::string> to std::unique<std::string> reduces sizeof(Frame) from 64 to 32. The smaller type reduces the cycle and instruction counts by 23% and 4.4%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords in a MemProf V2 profile. The peak memory usage is cut down nearly by half.
Configuration menu - View commit details
-
Copy full SHA for 19fd58a - Browse repository at this point
Copy the full SHA 19fd58aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 580a488 - Browse repository at this point
Copy the full SHA 580a488View commit details -
[ELF] Keep non-alloc orphan sections at the end
https://reviews.llvm.org/D85867 changed the way we assign file offsets (alloc sections first, then non-alloc sections). It also removed a non-alloc special case from `findOrphanPos`. Looking at the memory-nonalloc-no-warn.test change, which would be needed by llvm#93761, it makes sense to restore the previous behavior: when placing non-alloc orphan sections, keep these sections at the end so that the section index order matches the file offset order. This change is cosmetic. In sections-nonalloc.s, GNU ld places the orphan `other3` in the middle and the orphan .symtab/.shstrtab/.strtab at the end. Pull Request: llvm#94519
Configuration menu - View commit details
-
Copy full SHA for 7e5a6ca - Browse repository at this point
Copy the full SHA 7e5a6caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a31f5e - Browse repository at this point
Copy the full SHA 5a31f5eView commit details -
[lldb/crashlog] Remove aarch64 requirement on crashlog tests (llvm#94553
) This PR removes the `target-aarch64` requirement on the crashlog tests to exercice them on Intel bots and make image loading single-threaded temporarily while implementing a fix for a deadlock issue when loading the images in parallel. Signed-off-by: Med Ismail Bennani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c7b1d07 - Browse repository at this point
Copy the full SHA c7b1d07View commit details -
[Offload] Fix missing
abs
function for testSummary: We don't have the abs function to link against, just use the builtin.
Configuration menu - View commit details
-
Copy full SHA for 215c9f1 - Browse repository at this point
Copy the full SHA 215c9f1View commit details -
[LLVM] Do not require shell for some tests (llvm#94595)
Remove `REQUIRES: shell` from some tests that seem fine without it. Tested on Windows and with LIT_USE_INTERNAL_SHELL=1 on Linux.
Configuration menu - View commit details
-
Copy full SHA for 91bf0b0 - Browse repository at this point
Copy the full SHA 91bf0b0View commit details -
NFC: resolve TODO in LLVM dialect conversions (llvm#91497)
Relaxes restriction that certain public utility functions only apply to the builtin ModuleOp.
Configuration menu - View commit details
-
Copy full SHA for dae0098 - Browse repository at this point
Copy the full SHA dae0098View commit details -
[lldb] Include memory stats in statistics summary (llvm#94671)
The summary already includes other size information, e.g. total debug info size in bytes. The only other way I can get this information is by dumping all statistics which can be quite large. Adding it to the summary seems fair.
Configuration menu - View commit details
-
Copy full SHA for 093ff68 - Browse repository at this point
Copy the full SHA 093ff68View commit details -
[Offload] Use the kernel argument size directly in AMDGPU offloading (l…
…lvm#94667) Summary: The old COV3 implementation of HSA used to omit the implicit arguments from the kernel argument size. For COV4 and COV5 this is no longer the case so we can simply use the size reported from the symbol information. See ROCm/ROCR-Runtime#117 (comment)
Configuration menu - View commit details
-
Copy full SHA for fc0f7bf - Browse repository at this point
Copy the full SHA fc0f7bfView commit details -
[RISCV][InsertVSETVLI] Check for undef register operand directly [nfc]
getVNInfoFromReg is expected to return a nullptr if-and-only-if the operand is undef. (This was asserted for.) Reverse the order of the checks to simplify an upcoming set of patches.
Configuration menu - View commit details
-
Copy full SHA for 91b0814 - Browse repository at this point
Copy the full SHA 91b0814View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3df6c97 - Browse repository at this point
Copy the full SHA 3df6c97View commit details -
[ProfileData] Remove swapToHostOrder (llvm#94665)
This patch removes swapToHostOrder in favor of llvm::support::endian::readNext as swapToHostOrder is too thin a wrapper around readNext. Note that there are two variants of readNext: - readNext<type, endian, align>(ptr) - readNext<type, align>(ptr, endian) swapToHostOrder uses the former, but this patch switches to the latter. While we are at it, this patch teaches readNext to default to unaligned just as I did in: commit 568368a Author: Kazu Hirata <[email protected]> Date: Mon Apr 15 19:05:30 2024 -0700
Configuration menu - View commit details
-
Copy full SHA for d7fddb4 - Browse repository at this point
Copy the full SHA d7fddb4View commit details -
[clang] Fix flag typo in comment
Fixed for more accurate searches of the flag `-Wsystem-headers-in-module=`.
Configuration menu - View commit details
-
Copy full SHA for 898bd85 - Browse repository at this point
Copy the full SHA 898bd85View commit details -
[memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> …
…(NFC) (llvm#94432) This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, when I sort all call stacks by their lengths, the length at the first percentile is already 2. That is, 99 percent of call stacks do not take advantage of the inline element. Using std::vector<Frame> reduces the cycle and instruction counts by 11% and 22%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
Configuration menu - View commit details
-
Copy full SHA for 15701c1 - Browse repository at this point
Copy the full SHA 15701c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 66026f0 - Browse repository at this point
Copy the full SHA 66026f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff9f8a7 - Browse repository at this point
Copy the full SHA ff9f8a7View commit details -
[libc] fixed target issue with exit_handler (llvm#94678)
- addressed llvm#94317 (comment) - added conditional in cmake file for exit_handler object library Co-authored-by: Aaryan Shukla <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 37a2cc0 - Browse repository at this point
Copy the full SHA 37a2cc0View commit details -
[sanitizer] Make CHECKs in bitvector more precise (NFC) (llvm#94630)
These CHECKs are all checking indices, which must be strictly smaller than the size (otherwise they would go out of bounds).
Configuration menu - View commit details
-
Copy full SHA for f99ee69 - Browse repository at this point
Copy the full SHA f99ee69View commit details -
[sanitizer_common] Change allocator base in test case for compatibili… (
llvm#93234) …ty with high-entropy ASLR With high-entropy ASLR (e.g., 32-bits == 16TB), the allocator base of 0x700000000000 (112TB) may collide with the placement of the libraries (e.g., on Linux, the mmap base could be 128TB - 16TB == 112TB). This results in a segfault in the test case. This patch moves the allocator base below the PIE program segment, inspired by fb77ca0. As per that patch: 1) we are leaving the old behavior for Apple 2) since ASLR cannot be set above 32-bits for x86-64 Linux, we expect this new layout to be durable. Note that this is only changing a test case, not the behavior of sanitizers. Sanitizers have their own settings for initializing the allocator base. Reproducer: 1. ninja check-sanitizer # Just to build the test binary needed below; no need to actually run the tests here 2. sudo sysctl vm.mmap_rnd_bits=32 # Increase ASLR entropy 3. for f in `seq 1 10000`; do echo $f; GTEST_FILTER=*SizeClassAllocator64Dense ./projects/compiler-rt/lib/sanitizer_common/tests/Sanitizer-x86_64-Test > /tmp/x; if [ $? -ne 0 ]; then cat /tmp/x; fi; done
Configuration menu - View commit details
-
Copy full SHA for f09cac8 - Browse repository at this point
Copy the full SHA f09cac8View commit details -
[compiler-rt] Map internal_sigaction to __sys_sigaction on FreeBSD (l…
…lvm#84441) This function is called during very early startup and which can result in a crash on FreeBSD. The sigaction() function in libc is indirected via a table so that it can be interposed by the threading library rather than calling the syscall directly. In the crash I was observing this table had not yet been relocated, so we ended up jumping to an invalid address. To avoid this problem we can call __sys_sigaction, which calls the syscall directly and in FreeBSD 15 is part of libsys rather than libc, so does not depend on libc being fully initialized.
Configuration menu - View commit details
-
Copy full SHA for ec12488 - Browse repository at this point
Copy the full SHA ec12488View commit details -
[llvm/IR] Fix module build issue following e57308b (NFC) (llvm#94580)
This patch fixes a build issue following e57308b when enabling module build. With that change, we failed to build the LLVM_IR module since GEPNoWrapFlags wasn't defined prior to using it. This patch addressed that issue by including the missing header in `llvm/IR/IRBuilderFolder.h` which uses the `GEPNoWrapFlags` type. This should ensure that we can always build the `LLVM_IR` module. Signed-off-by: Med Ismail Bennani <[email protected]> Signed-off-by: Med Ismail Bennani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9905f64 - Browse repository at this point
Copy the full SHA 9905f64View commit details -
[memprof] Add CallStackRadixTreeBuilder (llvm#93784)
Call stacks are a huge portion of the MemProf profile, taking up 70+% of the profile file size. This patch implements a radix tree to compress call stacks, which are known to have long common prefixes. Specifically, CallStackRadixTreeBuilder, introduced in this patch, takes call stacks in the MemProf profile, sorts them in the dictionary order to maximize the common prefix between adjacent call stacks, and then encodes a radix tree into a single array that is ready for serialization. The resulting radix array is essentially a concatenation of call stack arrays, each encoded with its length followed by the payload, except that these arrays contain "instructions" like "skip 7 elements forward" to borrow common prefixes from other call stacks. This patch does not integrate with the MemProf serialization/deserialization infrastructure yet. Once integrated, the radix tree is expected to roughly halve the file size of the MemProf profile.
Configuration menu - View commit details
-
Copy full SHA for a7c205d - Browse repository at this point
Copy the full SHA a7c205dView commit details -
[clang-tidy]fix crashing when self include cycles for misc-header-inc…
…lude-cycle (llvm#94636) Fixes: llvm#94634
Configuration menu - View commit details
-
Copy full SHA for 625bd35 - Browse repository at this point
Copy the full SHA 625bd35View commit details -
Configuration menu - View commit details
-
Copy full SHA for 300e13b - Browse repository at this point
Copy the full SHA 300e13bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 31f7172 - Browse repository at this point
Copy the full SHA 31f7172View commit details -
Configuration menu - View commit details
-
Copy full SHA for ceb964d - Browse repository at this point
Copy the full SHA ceb964dView commit details -
[dfsan] Add test case for sscanf (llvm#94700)
This test case shows a limitation of DFSan's sscanf implementation (introduced in https://reviews.llvm.org/D153775): it simply ignores ordinary characters in the format string, instead of actually comparing them against the input. This may change the semantics of instrumented programs. Importantly, this also means that DFSan's release_shadow_space.c test, which relies on sscanf to scrape the RSS from /proc/maps output, will incorrectly match lines that don't contain RSS information. As a result, it adding together numbers from irrelevant output (e.g., base addresses), resulting in test flakiness (llvm#91287).
Configuration menu - View commit details
-
Copy full SHA for 720509b - Browse repository at this point
Copy the full SHA 720509bView commit details -
Make it easier to add CREL support.
Configuration menu - View commit details
-
Copy full SHA for 906734b - Browse repository at this point
Copy the full SHA 906734bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ab25b2 - Browse repository at this point
Copy the full SHA 9ab25b2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f36f6a5 - Browse repository at this point
Copy the full SHA f36f6a5View commit details -
[InstCombine] Improve coverage of
foldSelectValueEquivalence
for co……nstants We don't need the `noundef` check if the new simplification is a constant. This cleans up regressions from folding multiuse: `(icmp eq/ne (sub/xor x, y), 0)` -> `(icmp eq/ne x, y)`. Closes llvm#88298
Configuration menu - View commit details
-
Copy full SHA for 3865e74 - Browse repository at this point
Copy the full SHA 3865e74View commit details -
[RISCV] Unify all the code that adds unaligned-scalar/vector-mem to F…
…eatures vector. (llvm#94660) Instead of having multiple places insert into the Features vector independently, check all the conditions in one place. This avoids a subtle ordering requirement that -mstrict-align processing had to be done after the others.
Configuration menu - View commit details
-
Copy full SHA for b7c3fdb - Browse repository at this point
Copy the full SHA b7c3fdbView commit details -
Configuration menu - View commit details
-
Copy full SHA for f14147d - Browse repository at this point
Copy the full SHA f14147dView commit details -
[clang-format]: Annotate colons found in inline assembly (llvm#92617)
Short-circuit the parsing of tok::colon to label colons found within lines starting with asm as InlineASMColon. Fixes llvm#92616. --------- Co-authored-by: Owen Pan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1b8d567 - Browse repository at this point
Copy the full SHA 1b8d567View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12b9224 - Browse repository at this point
Copy the full SHA 12b9224View commit details -
[serialization] no transitive decl change (llvm#92083)
Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.
Configuration menu - View commit details
-
Copy full SHA for 59bed9c - Browse repository at this point
Copy the full SHA 59bed9cView commit details -
[llvm][ScheduleDAG] Re-arrange SUnit's members to make it smaller (ll…
…vm#94547) before: ``` *** Dumping AST Record Layout 0 | class llvm::SUnit 0 | SDNode * Node 8 | MachineInstr * Instr 16 | SUnit * OrigNode 24 | const MCSchedClassDesc * SchedClass 32 | class llvm::SmallVector<class llvm::SDep, 4> Preds 32 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 32 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 32 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 32 | class llvm::SmallVectorBase<uint32_t> (base) 32 | void * BeginX 40 | unsigned int Size 44 | unsigned int Capacity 48 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 48 | char[64] InlineElts 112 | class llvm::SmallVector<class llvm::SDep, 4> Succs 112 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 112 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 112 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 112 | class llvm::SmallVectorBase<uint32_t> (base) 112 | void * BeginX 120 | unsigned int Size 124 | unsigned int Capacity 128 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 128 | char[64] InlineElts 192 | unsigned int NodeNum 196 | unsigned int NodeQueueId 200 | unsigned int NumPreds 204 | unsigned int NumSuccs 208 | unsigned int NumPredsLeft 212 | unsigned int NumSuccsLeft 216 | unsigned int WeakPredsLeft 220 | unsigned int WeakSuccsLeft 224 | unsigned short NumRegDefsLeft 226 | unsigned short Latency 228:0-0 | _Bool isVRegCycle 228:1-1 | _Bool isCall 228:2-2 | _Bool isCallOp 228:3-3 | _Bool isTwoAddress 228:4-4 | _Bool isCommutable 228:5-5 | _Bool hasPhysRegUses 228:6-6 | _Bool hasPhysRegDefs 228:7-7 | _Bool hasPhysRegClobbers 229:0-0 | _Bool isPending 229:1-1 | _Bool isAvailable 229:2-2 | _Bool isScheduled 229:3-3 | _Bool isScheduleHigh 229:4-4 | _Bool isScheduleLow 229:5-5 | _Bool isCloned 229:6-6 | _Bool isUnbuffered 229:7-7 | _Bool hasReservedResource 232 | Sched::Preference SchedulingPref 236:0-0 | _Bool isDepthCurrent 236:1-1 | _Bool isHeightCurrent 240 | unsigned int Depth 244 | unsigned int Height 248 | unsigned int TopReadyCycle 252 | unsigned int BotReadyCycle 256 | const TargetRegisterClass * CopyDstRC 264 | const TargetRegisterClass * CopySrcRC | [sizeof=272, dsize=272, align=8, | nvsize=272, nvalign=8] ``` after: ``` *** Dumping AST Record Layout 0 | class llvm::SUnit 0 | union llvm::SUnit::(anonymous at /Users/jonathan_roelofs/llvm-upstream/llvm/include/llvm/CodeGen/ScheduleDAG.h:246:5) 0 | SDNode * Node 0 | MachineInstr * Instr 8 | SUnit * OrigNode 16 | const MCSchedClassDesc * SchedClass 24 | const TargetRegisterClass * CopyDstRC 32 | const TargetRegisterClass * CopySrcRC 40 | class llvm::SmallVector<class llvm::SDep, 4> Preds 40 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 40 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 40 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 40 | class llvm::SmallVectorBase<uint32_t> (base) 40 | void * BeginX 48 | unsigned int Size 52 | unsigned int Capacity 56 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 56 | char[64] InlineElts 120 | class llvm::SmallVector<class llvm::SDep, 4> Succs 120 | class llvm::SmallVectorImpl<class llvm::SDep> (base) 120 | class llvm::SmallVectorTemplateBase<class llvm::SDep> (base) 120 | class llvm::SmallVectorTemplateCommon<class llvm::SDep> (base) 120 | class llvm::SmallVectorBase<uint32_t> (base) 120 | void * BeginX 128 | unsigned int Size 132 | unsigned int Capacity 136 | struct llvm::SmallVectorStorage<class llvm::SDep, 4> (base) 136 | char[64] InlineElts 200 | unsigned int NodeNum 204 | unsigned int NodeQueueId 208 | unsigned int NumPreds 212 | unsigned int NumSuccs 216 | unsigned int NumPredsLeft 220 | unsigned int NumSuccsLeft 224 | unsigned int WeakPredsLeft 228 | unsigned int WeakSuccsLeft 232 | unsigned int TopReadyCycle 236 | unsigned int BotReadyCycle 240 | unsigned int Depth 244 | unsigned int Height 248:0-0 | _Bool isVRegCycle 248:1-1 | _Bool isCall 248:2-2 | _Bool isCallOp 248:3-3 | _Bool isTwoAddress 248:4-4 | _Bool isCommutable 248:5-5 | _Bool hasPhysRegUses 248:6-6 | _Bool hasPhysRegDefs 248:7-7 | _Bool hasPhysRegClobbers 249:0-0 | _Bool isPending 249:1-1 | _Bool isAvailable 249:2-2 | _Bool isScheduled 249:3-3 | _Bool isScheduleHigh 249:4-4 | _Bool isScheduleLow 249:5-5 | _Bool isCloned 249:6-6 | _Bool isUnbuffered 249:7-7 | _Bool hasReservedResource 250 | unsigned short NumRegDefsLeft 252 | unsigned short Latency 254:0-0 | _Bool isDepthCurrent 254:1-1 | _Bool isHeightCurrent 254:2-2 | _Bool isNode 254:3-3 | _Bool isInst 254:4-7 | Sched::Preference SchedulingPref | [sizeof=256, dsize=255, align=8, | nvsize=255, nvalign=8] ```
Configuration menu - View commit details
-
Copy full SHA for c5b5dcd - Browse repository at this point
Copy the full SHA c5b5dcdView commit details -
Revert "[serialization] no transitive decl change (llvm#92083)"
This reverts commit 5c10487. The ArmV7 bot is complaining the change breaks the alignment.
Configuration menu - View commit details
-
Copy full SHA for b0281f1 - Browse repository at this point
Copy the full SHA b0281f1View commit details -
[AMDGPU] Auto-generating lit test patterns (NFC) (llvm#93837)
Test CodeGen/AMDGPU/build_vector.ll has the lit patterns partially hand-written and the rest auto-generated. It doesn't look good when changes are required with future patches. Auto-generating the entire pattern. Moved out the R600 test into build_vector-r600.ll.
Configuration menu - View commit details
-
Copy full SHA for 09ce478 - Browse repository at this point
Copy the full SHA 09ce478View commit details -
[AMDGPU] Auto-generated some lit test patterns (NFC). (llvm#94310)
Also, converted the R600 RUN lines from some tests into standalone tests.
Configuration menu - View commit details
-
Copy full SHA for 634fbfb - Browse repository at this point
Copy the full SHA 634fbfbView commit details -
[NewPM][CodeGen] Port
regallocfast
to new pass manager (llvm#94426)This pull request port `regallocfast` to new pass manager. It exposes the parameter `filter` to handle different register classes for AMDGPU. IIUC AMDGPU need to allocate different register classes separately so it need implement its own `--<reg-class>-regalloc`. Now users can use e.g. `-passe=regallocfast<filter=sgpr>` to allocate specific register class. The command line option `--regalloc-npm` is still in work progress, plan to reuse the syntax of passes, e.g. use `--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to replace `--sgpr-regalloc` and `--vgpr-regalloc`.
Configuration menu - View commit details
-
Copy full SHA for eb3090e - Browse repository at this point
Copy the full SHA eb3090eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 638074f - Browse repository at this point
Copy the full SHA 638074fView commit details -
[test] Don't generate
regalloc-amdgpu.s
in llvm#94426 (llvm#94722)The test will generate an empty `regalloc-amdgpu.s` file in test, which causes an unresolved test.
Configuration menu - View commit details
-
Copy full SHA for 9928aa4 - Browse repository at this point
Copy the full SHA 9928aa4View commit details -
[clang-tidy] refactor misc-header-include-cycle (llvm#94697)
1. merge valid check 2. use range base loop
Configuration menu - View commit details
-
Copy full SHA for 95f34a7 - Browse repository at this point
Copy the full SHA 95f34a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for e085ae5 - Browse repository at this point
Copy the full SHA e085ae5View commit details -
Fix spurious non-strict availability warning (llvm#94377)
The availability attributes are stored on the function declarations. The code was looking for them in the function template declarations. This resulted in spuriously diagnosing (non-strict) availablity issues in contexts that are not available. Co-authored-by: Gabor Horvath <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a65c853 - Browse repository at this point
Copy the full SHA a65c853View commit details -
[mlir][tensor] Fix FoldTensorCastProducerOp for multiple result opera…
…tions (llvm#93374) For patterns where there are multiple results apart from dpsInits, this fails. E.g.: ``` %13:2 = iree_codegen.ukernel.generic "iree_uk_unpack" ins(%extracted_slice : tensor<?x1x16x16xf32>) outs(%11 : tensor<?x?xf32>) ... -> tensor<?x?xf32>, i32 ``` The above op has results apart from dpsInit and hence fails. The PR assumes that the result has dpsInits followed by nonDpsInits.
Configuration menu - View commit details
-
Copy full SHA for 5694e29 - Browse repository at this point
Copy the full SHA 5694e29View commit details -
Configuration menu - View commit details
-
Copy full SHA for e8ac511 - Browse repository at this point
Copy the full SHA e8ac511View commit details -
[clang][Interp] Improve APValue machinery
Handle lvalues pointing to declarations, unions and member pointers.
Configuration menu - View commit details
-
Copy full SHA for da152a0 - Browse repository at this point
Copy the full SHA da152a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 47e0905 - Browse repository at this point
Copy the full SHA 47e0905View commit details -
[lldb] Split ValueObject::CreateChildAtIndex into two functions (llvm…
…#94455) The the function is doing two fairly different things, depending on how it is called. While this allows for some code reuse, it also makes it hard to override it correctly. Possibly for this reason ValueObjectSynthetic overerides GetChildAtIndex instead, which forces it to reimplement some of its functionality, most notably caching of generated children. Splitting this up makes it easier to move the caching to a common place (and hopefully makes the code easier to follow in general).
Configuration menu - View commit details
-
Copy full SHA for e47cb50 - Browse repository at this point
Copy the full SHA e47cb50View commit details -
Revert "[X86] Assign AVX10_1 feature priority to align with gcc. (llv…
…m#94557)" (llvm#94730) This reverts commit d843c02.
Configuration menu - View commit details
-
Copy full SHA for 908d925 - Browse repository at this point
Copy the full SHA 908d925View commit details -
[memprof] Use std::move in ContextEdge::ContextEdge (NFC) (llvm#94687)
Since the constructor of ContextEdge takes ContextIds by value, we should move it to the corresponding member variable as suggested by clang-tidy's performance-unnecessary-value-param. While we are at it, this patch updates a couple of callers. To avoid the ambiguity in the evaluation order among the constructor arguments, I'm calling computeAllocType before calling the constructor.
Configuration menu - View commit details
-
Copy full SHA for ed0d45e - Browse repository at this point
Copy the full SHA ed0d45eView commit details -
[ORC] Switch ExecutionSession::ErrorReporter to use unique_function.
This allows the ReportError functor to hold move-only types.
Configuration menu - View commit details
-
Copy full SHA for 4d849a4 - Browse repository at this point
Copy the full SHA 4d849a4View commit details -
[LoongArch] Set isReMaterializable on LU{12,32,52}I.D/ADDI.D and {X}O…
…RI instructions (llvm#94552)
Configuration menu - View commit details
-
Copy full SHA for 137038f - Browse repository at this point
Copy the full SHA 137038fView commit details -
Configuration menu - View commit details
-
Copy full SHA for b0d738c - Browse repository at this point
Copy the full SHA b0d738cView commit details -
[LoongArch] Add a pass to rewrite rd to r0 for non-computational inst…
…rs whose return values are unused (llvm#94590) This patch adds a peephole pass `LoongArchDeadRegisterDefinitions`. It rewrites `rd` to `r0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO.
Configuration menu - View commit details
-
Copy full SHA for 9516710 - Browse repository at this point
Copy the full SHA 9516710View commit details -
[clang][Interp][NFC] Add GetPtrFieldPop opcode
And change the previous GetPtrField to only peek() the base pointer. We can get rid of a whole bunch of DupPtr ops this way.
Configuration menu - View commit details
-
Copy full SHA for 6fc4e97 - Browse repository at this point
Copy the full SHA 6fc4e97View commit details -
[analyzer][NFC] Factor out NoOwnershipChangeVisitor (llvm#94357)
In preparation for adding essentially the same visitor to StreamChecker, this patch factors this visitor out to a common header. I'll be the first to admit that the interface of these classes are not terrific, but it rather tightly held back by its main technical debt, which is NoStoreFuncVisitor, the main descendant of NoStateChangeVisitor. Change-Id: I99d73ccd93a18dd145bbbc83afadbb432dd42b90
Configuration menu - View commit details
-
Copy full SHA for 7375a39 - Browse repository at this point
Copy the full SHA 7375a39View commit details -
Configuration menu - View commit details
-
Copy full SHA for ea0fcca - Browse repository at this point
Copy the full SHA ea0fccaView commit details -
[docs] Fix benchmarking tips (llvm#94724)
This PR fixes an incorrect line for setting scaling_governer in benchmarking tips.
Configuration menu - View commit details
-
Copy full SHA for 5a0978c - Browse repository at this point
Copy the full SHA 5a0978cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 711196a - Browse repository at this point
Copy the full SHA 711196aView commit details -
[clang][Interp] Remove StoragKind limitation in Pointer assign operators
It's not strictly needed and did cause some test failures.
Configuration menu - View commit details
-
Copy full SHA for b1fafc4 - Browse repository at this point
Copy the full SHA b1fafc4View commit details -
Configuration menu - View commit details
-
Copy full SHA for c0635ee - Browse repository at this point
Copy the full SHA c0635eeView commit details -
[MLIR] Translate DIStringType. (llvm#94480)
This PR handle translation of DIStringType. Mostly mechanical changes to translate DIStringType to/from DIStringTypeAttr. The 'stringLength' field is 'DIVariable' in DIStringType. As there was no `DIVariableAttr` previously, it has been added to ease the translation. --------- Co-authored-by: Tobias Gysi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e336acf - Browse repository at this point
Copy the full SHA e336acfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 08bf183 - Browse repository at this point
Copy the full SHA 08bf183View commit details -
[flang][Transforms][NFC] Remove boilerplate from vscale range pass (l…
…lvm#94598) Use tablegen to generate the pass constructor. This pass is supposed to add function attributes so it does not need to operate on other top level operations.
Configuration menu - View commit details
-
Copy full SHA for 7db1232 - Browse repository at this point
Copy the full SHA 7db1232View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ce3900 - Browse repository at this point
Copy the full SHA 7ce3900View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d54dc1 - Browse repository at this point
Copy the full SHA 8d54dc1View commit details -
[ARM] Add NEON support for ISD::ABDS/ABDU nodes. (llvm#94504)
As noted on llvm#94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead. Fixes llvm#94466
Configuration menu - View commit details
-
Copy full SHA for 538584d - Browse repository at this point
Copy the full SHA 538584dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a74cf9d - Browse repository at this point
Copy the full SHA a74cf9dView commit details -
[DebugInfo] Add DW_OP_LLVM_extract_bits (llvm#93990)
This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.
Configuration menu - View commit details
-
Copy full SHA for e173fa7 - Browse repository at this point
Copy the full SHA e173fa7View commit details -
Add checks before hoisting out in loop pipelining (llvm#90872)
Currently, during a loop pipelining transformation, operations may be hoisted out without any checks on the loop bounds, which leads to incorrect transformations and unexpected behaviour. The following [issue ](llvm#90870) describes the problem more extensively, including an example. The proposed fix adds some check in the loop bounds before and applies the maximum hoisting.
Configuration menu - View commit details
-
Copy full SHA for b37c6bd - Browse repository at this point
Copy the full SHA b37c6bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 85cbf2f - Browse repository at this point
Copy the full SHA 85cbf2fView commit details -
[clang][Interp] Fix refers_to_enclosing_variable_or_capture DREs
They do not count into lambda captures, so visit them lazily.
Configuration menu - View commit details
-
Copy full SHA for f0fde2b - Browse repository at this point
Copy the full SHA f0fde2bView commit details -
[SimplifyCFG] Remove bogus UTC line from test (NFC)
The check lines in this test were clearly not generated by UTC.
Configuration menu - View commit details
-
Copy full SHA for 66dad78 - Browse repository at this point
Copy the full SHA 66dad78View commit details -
[SimplifyCFG] Regenerate switch to lookup tests (NFC)
Regenerate these with --check-globals. The manual global CHECKS get dropped during regeneration otherwise. Annoyingly UTC insists on putting the globals directly before the first function, so the first comment is a bit out of place now.
Configuration menu - View commit details
-
Copy full SHA for 405d7d5 - Browse repository at this point
Copy the full SHA 405d7d5View commit details -
[mlir][vector] Add n-d deinterleave lowering (llvm#94237)
This patch implements the lowering for vector deinterleave for vector of n-dimensions. Process involves unrolling the n-d vector to a series of one-dimensional vectors. The deinterleave operation is then used on these vectors. From: ``` %0, %1 = vector.deinterleave %a : vector<2x8xi8> -> vector<2x4xi8> ``` To: ``` %cst = arith.constant dense<0> : vector<2x4xi32> %0 = vector.extract %arg0[0] : vector<8xi32> from vector<2x8xi32> %res1, %res2 = vector.deinterleave %0 : vector<8xi32> -> vector<4xi32> %1 = vector.insert %res1, %cst [0] : vector<4xi32> into vector<2x4xi32> %2 = vector.insert %res2, %cst [0] : vector<4xi32> into vector<2x4xi32> %3 = vector.extract %arg0[1] : vector<8xi32> from vector<2x8xi32> %res1_0, %res2_1 = vector.deinterleave %3 : vector<8xi32> -> vector<4xi32> %4 = vector.insert %res1_0, %1 [1] : vector<4xi32> into vector<2x4xi32> %5 = vector.insert %res2_1, %2 [1] : vector<4xi32> into vector<2x4xi32> ...etc. ```
Configuration menu - View commit details
-
Copy full SHA for afcd18f - Browse repository at this point
Copy the full SHA afcd18fView commit details -
[ARM] r11 is reserved when using -mframe-chain=aapcs (llvm#86951)
When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.
Configuration menu - View commit details
-
Copy full SHA for 3f99c0d - Browse repository at this point
Copy the full SHA 3f99c0dView commit details -
[DAG] Always allow folding XOR patterns to ABS pre-legalization (llvm…
…#94601) Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.
Configuration menu - View commit details
-
Copy full SHA for 671bcef - Browse repository at this point
Copy the full SHA 671bcefView commit details -
fix(mlir/**.py): fix comparison to None (llvm#94019)
from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9ecb812 - Browse repository at this point
Copy the full SHA 9ecb812View commit details -
[ARM] Add support for Cortex-R52+ (llvm#94633)
Cortex-R52+ is an Armv8-R AArch32 CPU. Technical Reference Manual for Cortex-R52+: https://developer.arm.com/documentation/102199/latest/
Configuration menu - View commit details
-
Copy full SHA for 0601711 - Browse repository at this point
Copy the full SHA 0601711View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2fd4477 - Browse repository at this point
Copy the full SHA 2fd4477View commit details -
[clang][test] Skip interpreter value test on Arm 32 bit
llvm#89811 caused this test to fail, somehow. I think it may not be at fault, but actually be exposing some existing undefined behaviour, see llvm#94741. Skipping this for now to get the bots green again.
Configuration menu - View commit details
-
Copy full SHA for 14cd171 - Browse repository at this point
Copy the full SHA 14cd171View commit details -
Configuration menu - View commit details
-
Copy full SHA for 126837c - Browse repository at this point
Copy the full SHA 126837cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f3b593 - Browse repository at this point
Copy the full SHA 7f3b593View commit details -
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (llvm#89796)
This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).
Configuration menu - View commit details
-
Copy full SHA for 4e4eb43 - Browse repository at this point
Copy the full SHA 4e4eb43View commit details -
[BOLT][NFC] Infailable fns return void (llvm#92018)
Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure. Thus, just return nothing.
Configuration menu - View commit details
-
Copy full SHA for 9c42b20 - Browse repository at this point
Copy the full SHA 9c42b20View commit details -
[CodeGen][SDAG] Remove CombinedNodes SmallPtrSet (llvm#94609)
This "small" set grows quite large and it's more performant to store whether a node has been combined before in the node itself. As this information is only relevant for nodes that are currently not in the worklist, add a second state to the CombinerWorklistIndex (-2) to indicate that a node is currently not in a worklist, but was combined before. This brings a substantial performance improvement.
Configuration menu - View commit details
-
Copy full SHA for 35fbc3f - Browse repository at this point
Copy the full SHA 35fbc3fView commit details -
[clang][Interp] Check ConstantExpr results for initialization
They need to be fully initialized, similar to global variables.
Configuration menu - View commit details
-
Copy full SHA for 6db6f7e - Browse repository at this point
Copy the full SHA 6db6f7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c1a3bf7 - Browse repository at this point
Copy the full SHA c1a3bf7View commit details -
[clang][Interp] Limit lambda capture lazy visting to actual captures
Check this by looking at the VarDecl.
Configuration menu - View commit details
-
Copy full SHA for a647101 - Browse repository at this point
Copy the full SHA a647101View commit details -
[serialization] no transitive decl change (llvm#92083)
Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.
Configuration menu - View commit details
-
Copy full SHA for a9b37d7 - Browse repository at this point
Copy the full SHA a9b37d7View commit details -
[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (llvm#…
…93680) Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC.
Configuration menu - View commit details
-
Copy full SHA for a0dcaf2 - Browse repository at this point
Copy the full SHA a0dcaf2View commit details -
[Frontend][OpenMP] Sort all the things in OMP.td, NFC (llvm#94653)
The file OMP.td is becoming tedious to update by hand due to the seemingly random ordering of various items in it. This patch brings order to it by sorting most of the contents. The clause definitions are sorted alphabetically with respect to the spelling of the clause.[1] The directive definitions are split into two leaf directives and compound directives.[2] Within each, definitions are sorted alphabetically with respect to the spelling, with the exception that "end xyz" directives are placed immediately following the definition of "xyz".[3] Within each directive definition, the lists of clauses are also sorted alphabetically. [1] All spellings are made of lowercase letters, _, or space. Ordering that includes non-letters follows the order assumed by the `sort` utility. [2] Compound directives refer to the consituent leaf directives, hence the leaf definitions must come first. [3] Some of the "end xyz" directives have properties derived from the corresponding "xyz" directive. This exception guarantees that "xyz" precedes the "end xyz".
Configuration menu - View commit details
-
Copy full SHA for 01be0a3 - Browse repository at this point
Copy the full SHA 01be0a3View commit details -
[flang][OpenMP] Lower
target .. private(..)
toomp.private
ops (l……lvm#94195) Extends delayed privatization support to `taraget .. private(..)`. With this PR, `private` is support for `target` **only** is delayed privatization mode.
Configuration menu - View commit details
-
Copy full SHA for 26ba412 - Browse repository at this point
Copy the full SHA 26ba412View commit details -
[libc] Correctly pass the C++ standard to NVPTX internal builds
Summary: The NVPTX build wasn't getting the `C++20` standard necessary for a few files.
Configuration menu - View commit details
-
Copy full SHA for 4407e67 - Browse repository at this point
Copy the full SHA 4407e67View commit details -
[mlir][linalg] Support lowering unpack with outer_dims_perm (llvm#94477)
This commit adds support for lowering `tensor.unpack` with a non-identity `outer_dims_perm`. This was previously left as a not-yet-implemented case.
Configuration menu - View commit details
-
Copy full SHA for df12b11 - Browse repository at this point
Copy the full SHA df12b11View commit details -
[mlir] Add reshape propagation patterns for tensor.pad (llvm#94489)
This PR adds fusion by collapsing and fusion by expansion patterns for `tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or collapsed as long as none of the padded dimensions will be expanded or collapsed.
Configuration menu - View commit details
-
Copy full SHA for f0cdc72 - Browse repository at this point
Copy the full SHA f0cdc72View commit details -
[mlir] Fix bugs in expand_shape patterns after semantics changes (llv…
…m#94631) After the `output_shape` field was added to `expand_shape` ops, dynamically sized expand shapes are now possible, but this was not accounted for in the folder. This PR tightens the constraints of the folder to fix this.
Configuration menu - View commit details
-
Copy full SHA for e4f8c4e - Browse repository at this point
Copy the full SHA e4f8c4eView commit details -
[ARM] Clean up neon_vabd.ll, vaba.ll and vabd.ll tests a bit. NFC
Change the target triple to remove some unnecessary instructions.
Configuration menu - View commit details
-
Copy full SHA for b9d3565 - Browse repository at this point
Copy the full SHA b9d3565View commit details -
[arm64] Add tan intrinsic lowering (llvm#94545)
This change is an implementation of llvm#87367 investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This PR is just for Tan. Now that x86 tan backend landed: llvm#90503 we can add other backends since the shared pieces are in tree now. Changes: - `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for arm64 backends. - `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall table - `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion for f128, f16, and vector\neon operations - `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define `G_FTAN` as a legal arm64 instruction resolves llvm#94755
Configuration menu - View commit details
-
Copy full SHA for 678428a - Browse repository at this point
Copy the full SHA 678428aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 50bec57 - Browse repository at this point
Copy the full SHA 50bec57View commit details -
[Clang] Add timeout for GPU detection utilities (llvm#94751)
Summary: The utilities `nvptx-arch` and `amdgpu-arch` are used to support `--offload-arch=native` among other utilities in clang. However, these rely on the GPU drivers to query the features. In certain cases these drivers can become locked up, which will lead to indefinate hangs on any compiler jobs running in the meantime. This patch adds a ten second timeout period for these utilities before it kills the job and errors out.
Configuration menu - View commit details
-
Copy full SHA for 86dd2c9 - Browse repository at this point
Copy the full SHA 86dd2c9View commit details -
[RISCV] Codegen support for XCVmem extension (llvm#76916)
All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc
Configuration menu - View commit details
-
Copy full SHA for 1b239ca - Browse repository at this point
Copy the full SHA 1b239caView commit details -
[MachineOutliner] Sort by Benefit to Cost Ratio (llvm#90264)
This PR depends on llvm#90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. |run (CTMark/) |baseline (1)|priority (2)|diff (1 -> 2)| |----------------|------------|------------|-------------| |lencod |349624 |349264 |-0.1030% | |SPASS |219672 |219480 |-0.0874% | |kc |271956 |251200 |-7.6321% | |sqlite3 |223920 |223708 |-0.0947% | |7zip-benchmark |405364 |402624 |-0.6759% | |bullet |139820 |139500 |-0.2289% | |consumer-typeset|295684 |290196 |-1.8560% | |pairlocalalign |72236 |72092 |-0.1993% | |tramp3d-v4 |189572 |189292 |-0.1477% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
Configuration menu - View commit details
-
Copy full SHA for 58c7def - Browse repository at this point
Copy the full SHA 58c7defView commit details -
[memprof] Clean up IndexedMemProfReader (NFC) (llvm#94710)
Parameter "Version" is confusing in deserializeV012 and deserializeV3 because we also have member variable "Version". Fortunately, parameter "Version" and member variable "Version" always have the same value because IndexedMemProfReader::deserialize initializes the member variable and passes it to deserializeV012 and deserializeV3. This patch removes the parameter.
Configuration menu - View commit details
-
Copy full SHA for dc9c2df - Browse repository at this point
Copy the full SHA dc9c2dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d913d5 - Browse repository at this point
Copy the full SHA 8d913d5View commit details -
[memprof] Use CallStackRadixTreeBuilder in the V3 format (llvm#94708)
This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.
Configuration menu - View commit details
-
Copy full SHA for 637baa5 - Browse repository at this point
Copy the full SHA 637baa5View commit details -
[mlir][vector] Remove Emulated Sub-directory (llvm#94742)
The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.
Configuration menu - View commit details
-
Copy full SHA for ad12734 - Browse repository at this point
Copy the full SHA ad12734View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f5aeb1 - Browse repository at this point
Copy the full SHA 7f5aeb1View commit details -
Configuration menu - View commit details
-
Copy full SHA for aae32f6 - Browse repository at this point
Copy the full SHA aae32f6View commit details -
[KnownBits] Remove
hasConflict()
assertions (llvm#94568)Allow KnownBits to represent "always poison" values via conflict. close: llvm#94436
Configuration menu - View commit details
-
Copy full SHA for 5b14f6d - Browse repository at this point
Copy the full SHA 5b14f6dView commit details -
[libc++][test][AIX] Only XFAIL atomic tests for before clang 19 (llvm…
Configuration menu - View commit details
-
Copy full SHA for 1508a3d - Browse repository at this point
Copy the full SHA 1508a3dView commit details -
[AArch64] Add patterns for add(uzp1(x,y), uzp2(x, y)) -> addp.
If we are extracting the even lanes and the odd lanes and adding them, we can use an addp instruction.
Configuration menu - View commit details
-
Copy full SHA for 53615ae - Browse repository at this point
Copy the full SHA 53615aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a93ccc - Browse repository at this point
Copy the full SHA 3a93cccView commit details -
[libc++][regex] Correctly adjust match prefix for zero-length matches. (
llvm#94550) For regex patterns that produce zero-length matches, there is one (imaginary) match in-between every character in the sequence being searched (as well as before the first character and after the last character). It's easiest to demonstrate using replacement: `std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where each exclamation mark makes a zero-length match visible. Currently our implementation doesn't correctly set the prefix of each zero-length match, "swallowing" the characters separating the imaginary matches -- e.g. when going through zero-length matches within `abc`, the corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this patch they will all be empty (`{'', '', '', ''}`). This happens in the implementation of `regex_iterator::operator++`. Note that the Standard spells out quite explicitly that the prefix might need to be adjusted when dealing with zero-length matches in [`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr): > In all cases in which the call to `regex_search` returns `true`, `match.prefix().first` shall be equal to the previous value of `match[0].second`... It is unspecified how the implementation makes these adjustments. [Reproduction example](https://godbolt.org/z/8ve6G3dav) ```cpp #include <iostream> #include <regex> #include <string> int main() { std::string str = "abc"; std::regex empty_matching_pattern(""); { // The underlying problem is that `regex_iterator::operator++` doesn't update // the prefix correctly. std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e; std::cout << "\""; for (; i != e; ++i) { const std::ssub_match& prefix = i->prefix(); std::cout << prefix.str(); } std::cout << "\"\n"; // Before the patch: "" // After the patch: "abc" } { // `regex_replace` makes the problem very visible. std::string replaced = std::regex_replace(str, empty_matching_pattern, "!"); std::cout << "\"" << replaced << "\"\n"; // Before the patch: "!!!!" // After the patch: "!a!b!c!" } } ``` Fixes llvm#64451 rdar://119912002
Configuration menu - View commit details
-
Copy full SHA for aaa160e - Browse repository at this point
Copy the full SHA aaa160eView commit details -
Re-apply llvm#87550 with fixes. Details: Some tests in fuchsia failed because of the newly added assertion. This was because `GetExceptionBreakpoint()` could be called before `g_dap.debugger` was initted. The fix here is to just lazily populate the list in GetExceptionBreakpoint() rather than assuming it's already been initted. (There is some nuisance here because we can't simply just populate it in DAP::DAP(), which is a global ctor and is called before `SBDebugger::Initialize()` is called. )
Configuration menu - View commit details
-
Copy full SHA for 8bb019d - Browse repository at this point
Copy the full SHA 8bb019dView commit details -
[libc++] Undeprecate shared_ptr atomic access APIs (llvm#92920)
This patch reverts 9b832b7 (llvm#87111): - [libc++] Deprecated `shared_ptr` Atomic Access APIs as per P0718R2 - [libc++] Implemented P2869R3: Remove Deprecated `shared_ptr` Atomic Access APIs from C++26 As explained in [1], the suggested replacement in P2869R3 is `__cpp_lib_atomic_shared_ptr`, which libc++ does not yet implement. Let's not deprecate the old way of doing things before the new way of doing things exists. [1]: llvm#87111 (comment)
Configuration menu - View commit details
-
Copy full SHA for 7e2707f - Browse repository at this point
Copy the full SHA 7e2707fView commit details -
[Reassociate] shifttest.ll - generate test checks to replace custom g…
…rep expression (and remove an unused argument)
Configuration menu - View commit details
-
Copy full SHA for fd08cef - Browse repository at this point
Copy the full SHA fd08cefView commit details -
[flang][runtime] add SHAPE runtime interface (llvm#94702)
Add SHAPE runtime API (will be used for assumed-rank, lowering is generating other cases inline). I tried to make it in a way were there is no dynamic allocation in the runtime/deallocation expected to be inserted by inline code for arrays that we know are small (lowering will just always stack allocate a rank 15 array to avoid dynamic stack allocation or heap allocation).
Configuration menu - View commit details
-
Copy full SHA for 3aa3f3a - Browse repository at this point
Copy the full SHA 3aa3f3aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a0529a - Browse repository at this point
Copy the full SHA 8a0529aView commit details -
[OpenMP] Fix passing target id features to AMDGPU offloading (llvm#94765
) Summary: AMDGPU supports a `target-id` feature which is used to qualify targets with different incompatible features. These are both rules and target features. Currently, we pass `-target-cpu` twice when offloading to OpenMP, and do not pass the target-id features at all. The effect was that passing something like `--offload-arch=gfx90a:xnack+` would show up as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the xnack completely and passing it twice. This patch fixes that to pass it once and then separate it like how HIP does.
Configuration menu - View commit details
-
Copy full SHA for e854b11 - Browse repository at this point
Copy the full SHA e854b11View commit details -
Fixed grammatical error in "enum specifier" error msg llvm#94443 (llv…
…m#94592) As discussed in llvm#94443, this PR changes the wording to be more correct.
Configuration menu - View commit details
-
Copy full SHA for 624a743 - Browse repository at this point
Copy the full SHA 624a743View commit details -
Configuration menu - View commit details
-
Copy full SHA for f7d4ecb - Browse repository at this point
Copy the full SHA f7d4ecbView commit details -
Check if LLD is built when checking if lto_supported (llvm#92752)
Otherwise, older copies of LLD may not understand the latest bitcode versions (for example, if we increase `ModuleSummaryIndex::BitCodeSummaryVersion`) Related to llvm#90692 (comment)
Configuration menu - View commit details
-
Copy full SHA for d931adf - Browse repository at this point
Copy the full SHA d931adfView commit details -
[mlir][vector][NFC] Make function name more meaningful in lit tests. (l…
…lvm#94538) It also moves the test near other similar test cases.
Configuration menu - View commit details
-
Copy full SHA for ae23164 - Browse repository at this point
Copy the full SHA ae23164View commit details -
Configuration menu - View commit details
-
Copy full SHA for b928554 - Browse repository at this point
Copy the full SHA b928554View commit details -
Configuration menu - View commit details
-
Copy full SHA for df0747c - Browse repository at this point
Copy the full SHA df0747cView commit details