forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] main from llvm:main #5641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
1,766
commits into
Ericsson:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When creating a 'yield', we have to make sure that the lexical scope we created gets cleaned up. This isn't really testable until a followup patch, but I got this wrong in quite a few places.
…nd alike as trivial and returns a retained value (#161135) Treat NSStringFromSelector, NSSelectorFromString, NSStringFromClass, NSClassFromString, NSStringFromProtocol, and NSProtocolFromString as trivial, and treat their return values as a safe pointer origin since the return value of these functions don't need to be retained.
#161089) This patch extends stdio redirection support to integrated and external terminals. Currently, these cases are not covered by the standard logic because `attach` is used instead of `launch`. To be honest, `runInTerminal` in [VSCode](https://github.com/microsoft/vscode/blob/main/src/vs/workbench/contrib/debug/node/terminals.ts#L188) request supports `>` and `<` for redirection, but not `2>`. We could use the `argsCanBeInterpretedByShell` option to use full power of shell, however it requieres proper escaping of arguments on lldb-dap side. So, I think it will be better to have the only one option to control stdio redirection that works consistently across all console modes. Also, fixed a small typo in a comparison that was leading to out-of-bound access, and added `null` as a possible value for `stdio` array in package.json.
When [1] landed, gdbremote server tests had to be updated to understand the new packet field. [1]: #163249
The BridgeOS SDK is capitalized, but previously failed to parse because we were looking for bridgeOS. This PR updates the enum value and the canonical spelling. rdar://162641896
…g a parameter (#160994) This PR updates the forward declaration checker so that unary operator & and * will be ignored for the purpose of determining if a given function argument is also a function argument of the caller / call-site.
With #161460 merged, we should also enforce it in CI (although nothing changed for current files)
Instead of just deferring to ptrtoint, we should truncate to the index width and then perform the ZextOrTrunc. This is effectively NFC since ptrtoint ends up doing the same thing, but handling it explicitly is cleaner and will make it easier to eventually upstream the changes needed for CHERI support. Reviewed By: nikic, arsenm Pull Request: #139423
This can improve performance on 32-bit baremetal targets.
…3612) This patch enhances the performance of `std::distance` and `std::ranges::distance` for non-random-access segmented iterators, e.g., `std::join_view` iterators. The original implementation operates in linear time, `O(n)`, where `n` is the total number of elements. The optimized version reduces this to approximately `O(n / segment_size)` by leveraging segmented structure, where `segment_size` is the average size of each segment. The table below summarizes the peak performance improvements observed across different segment sizes, with the total element count `n` ranging up to `1 << 20` (1,048,576 elements), based on benchmark results. ``` ---------------------------------------------------------------------------------------- Container/n/segment_size std::distance std::ranges::distance ---------------------------------------------------------------------------------------- join_view(vector<vector<int>>)/1048576/256 401.6x 422.9x join_view(deque<deque<int>>)/1048576/256 112.1x 132.6x join_view(vector<vector<int>>)/1048576/1024 1669.2x 1559.1x join_view(deque<deque<int>>)/1048576/1024 487.7x 497.4x ``` ## Benchmarks #### Segment size = 1024 ``` ----------------------------------------------------------------------------------------- Benchmark Before After Speedup ----------------------------------------------------------------------------------------- std::distance(join_view(vector<vector<int>>))/50 38.8 ns 1.01 ns 38.4x std::distance(join_view(vector<vector<int>>))/1024 660 ns 1.02 ns 647.1x std::distance(join_view(vector<vector<int>>))/4096 2934 ns 1.98 ns 1481.8x std::distance(join_view(vector<vector<int>>))/8192 5751 ns 3.92 ns 1466.8x std::distance(join_view(vector<vector<int>>))/16384 11520 ns 7.06 ns 1631.7x std::distance(join_view(vector<vector<int>>))/65536 46367 ns 32.2 ns 1440.6x std::distance(join_view(vector<vector<int>>))/262144 182611 ns 114 ns 1601.9x std::distance(join_view(vector<vector<int>>))/1048576 737785 ns 442 ns 1669.2x std::distance(join_view(deque<deque<int>>))/50 53.1 ns 6.13 ns 8.7x std::distance(join_view(deque<deque<int>>))/1024 854 ns 7.53 ns 113.4x std::distance(join_view(deque<deque<int>>))/4096 3507 ns 14.7 ns 238.6x std::distance(join_view(deque<deque<int>>))/8192 7114 ns 17.6 ns 404.2x std::distance(join_view(deque<deque<int>>))/16384 13997 ns 30.7 ns 455.9x std::distance(join_view(deque<deque<int>>))/65536 55598 ns 114 ns 487.7x std::distance(join_view(deque<deque<int>>))/262144 214293 ns 480 ns 446.4x std::distance(join_view(deque<deque<int>>))/1048576 833000 ns 2183 ns 381.6x rng::distance(join_view(vector<vector<int>>))/50 39.1 ns 1.10 ns 35.5x rng::distance(join_view(vector<vector<int>>))/1024 689 ns 1.14 ns 604.4x rng::distance(join_view(vector<vector<int>>))/4096 2753 ns 2.15 ns 1280.5x rng::distance(join_view(vector<vector<int>>))/8192 5530 ns 4.61 ns 1199.6x rng::distance(join_view(vector<vector<int>>))/16384 10968 ns 7.97 ns 1376.2x rng::distance(join_view(vector<vector<int>>))/65536 46009 ns 35.3 ns 1303.4x rng::distance(join_view(vector<vector<int>>))/262144 190569 ns 124 ns 1536.9x rng::distance(join_view(vector<vector<int>>))/1048576 746724 ns 479 ns 1559.1x rng::distance(join_view(deque<deque<int>>))/50 51.6 ns 6.57 ns 7.9x rng::distance(join_view(deque<deque<int>>))/1024 826 ns 6.50 ns 127.1x rng::distance(join_view(deque<deque<int>>))/4096 3323 ns 12.5 ns 265.8x rng::distance(join_view(deque<deque<int>>))/8192 6619 ns 19.1 ns 346.5x rng::distance(join_view(deque<deque<int>>))/16384 13495 ns 33.2 ns 406.5x rng::distance(join_view(deque<deque<int>>))/65536 53668 ns 114 ns 470.8x rng::distance(join_view(deque<deque<int>>))/262144 236277 ns 475 ns 497.4x rng::distance(join_view(deque<deque<int>>))/1048576 914177 ns 2157 ns 423.8x ----------------------------------------------------------------------------------------- ``` #### Segment size = 256 ``` ----------------------------------------------------------------------------------------- Benchmark Before After Speedup ----------------------------------------------------------------------------------------- std::distance(join_view(vector<vector<int>>))/50 38.1 ns 1.02 ns 37.4x std::distance(join_view(vector<vector<int>>))/1024 689 ns 2.06 ns 334.5x std::distance(join_view(vector<vector<int>>))/4096 2815 ns 7.01 ns 401.6x std::distance(join_view(vector<vector<int>>))/8192 5507 ns 14.3 ns 385.1x std::distance(join_view(vector<vector<int>>))/16384 11050 ns 33.7 ns 327.9x std::distance(join_view(vector<vector<int>>))/65536 44197 ns 118 ns 374.6x std::distance(join_view(vector<vector<int>>))/262144 175793 ns 449 ns 391.5x std::distance(join_view(vector<vector<int>>))/1048576 703242 ns 2140 ns 328.7x std::distance(join_view(deque<deque<int>>))/50 50.2 ns 6.12 ns 8.2x std::distance(join_view(deque<deque<int>>))/1024 835 ns 11.4 ns 73.2x std::distance(join_view(deque<deque<int>>))/4096 3353 ns 32.9 ns 101.9x std::distance(join_view(deque<deque<int>>))/8192 6711 ns 64.2 ns 104.5x std::distance(join_view(deque<deque<int>>))/16384 13231 ns 118 ns 112.1x std::distance(join_view(deque<deque<int>>))/65536 53523 ns 556 ns 96.3x std::distance(join_view(deque<deque<int>>))/262144 219101 ns 2166 ns 101.2x std::distance(join_view(deque<deque<int>>))/1048576 880277 ns 15852 ns 55.5x rng::distance(join_view(vector<vector<int>>))/50 37.7 ns 1.13 ns 33.4x rng::distance(join_view(vector<vector<int>>))/1024 697 ns 2.14 ns 325.7x rng::distance(join_view(vector<vector<int>>))/4096 2804 ns 7.52 ns 373.0x rng::distance(join_view(vector<vector<int>>))/8192 5749 ns 15.2 ns 378.2x rng::distance(join_view(vector<vector<int>>))/16384 11742 ns 34.8 ns 337.4x rng::distance(join_view(vector<vector<int>>))/65536 47274 ns 116 ns 407.7x rng::distance(join_view(vector<vector<int>>))/262144 187774 ns 444 ns 422.9x rng::distance(join_view(vector<vector<int>>))/1048576 749724 ns 2109 ns 355.5x rng::distance(join_view(deque<deque<int>>))/50 53.0 ns 6.09 ns 8.7x rng::distance(join_view(deque<deque<int>>))/1024 895 ns 11.0 ns 81.4x rng::distance(join_view(deque<deque<int>>))/4096 3825 ns 30.6 ns 125.0x rng::distance(join_view(deque<deque<int>>))/8192 7550 ns 60.5 ns 124.8x rng::distance(join_view(deque<deque<int>>))/16384 14847 ns 112 ns 132.6x rng::distance(join_view(deque<deque<int>>))/65536 56888 ns 453 ns 125.6x rng::distance(join_view(deque<deque<int>>))/262144 231395 ns 2034 ns 113.8x rng::distance(join_view(deque<deque<int>>))/1048576 933093 ns 15012 ns 62.2x ----------------------------------------------------------------------------------------- ``` Addresses a subtask of #102817. --------- Co-authored-by: Louis Dionne <[email protected]> Co-authored-by: A. Jiang <[email protected]>
…2780) This PR adds lowering of xegpu.load_matrix/store_matrix to xevm.blockload/blockstore or and llvm.load/store, depending on wi level attributes. It includes a few components: 1. adds wi-level attributes: subgroup_block_io. 2. expand load_matrix/store_matrix op definition to support scalar data (besides vector data). 2. adds a member function to mem_desc to compute the linearized address for a nd offsets. 3. add lowering depending on wi-level attributes: a) if subgroup_block_io attribute presents, lower to xevm.blockload/blockstore c) else lower to llvm.load/store. If result is a vector, lower to llvm.load/store with vector operand.
Various DAP tests are specifying their own timeouts, with values ranging from "1" to "20". Most of them seem arbitrary, but some come with a comment. The performance characters of running these tests in CI are unpredictable (they generally run much slower than developers expect) and really not something we can make assumptions about. I suspect these timeouts are a contributing factor to the flakiness of the DAP tests. This PR unifies the timeouts around a central value in the DAP server. Fixes #162523
…163466) Hexagon packets can visually straddle labels when data (e.g. jump tables) in the text section does not carry end-of-packet bits. In such cases the next instruction, even at a new symbol, appears to continue the previous packet. This patch resets packet state when encountering a new symbol so that packets at symbol starts are guaranteed to start in their own packet.
…tions (#162120) Op constraints are emitted as static standalone functions and need not be surrounded by the Dialect's C++ namespace. Currently they are, and this change stops emitting a namespace around these static functions.
…ng to DXIL (#161753) Introduces LLVM intrinsic `llvm.dx.resource.getdimensions.x` and its lowering to DXIL op `op.dx.getDimensions`. The intrinsic will be used to implement `GetDimension` for buffers. The lowering is using `undef` value since it is required by the DXIL format which is based on LLVM 3.7. Proposal update: llvm/wg-hlsl#350 Closes #112982
Reviewers: thurstond, fmayer Reviewed By: fmayer Pull Request: #163667
Reviewers: fmayer, thurstond Reviewed By: fmayer Pull Request: #163669
…s. (#163658) * Add a missing dependency to install struct_itimerval.h * Add fseeko/ftello declarations to stdio.h
Happened with 'undef,' -- comma not being recognized as a word boundary
Mutation `changeElementCountTo` now uses `ElementCount`
…ndling (#163567) Leave this to shuffle folding instead.
a1ef81d added overloads for `llvm.matrix.column.major.store` and `llvm.matrix.column.major.load` that allow strides to occupy an arbitrary bitwidth. This change wasn't reflected in the verifier, causing an assertion to trip when given strides overflowing 64-bit. This patch explicitly caps the bitwidth at 64, repairing the crash and avoiding future complexity dealing with strides that overflow 64 bits. PR: #163729
Extra test coverage for #163787.
Added Pattern for lowering `Math::ClampFOp` to `ROCDL::FMED3`. Also added `chipet` option to `MathToRocdl` pass to check for arch support ISA instructions Solves [#15072](#157052) Reapplies #160100 Un-reverts the merged #163259, and fixes the error. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]>
Seems like the function returns a dict, not a list of tuples like the other. Not sure of the schema or the design for this, so fixing the code without using the test-suite name. But this might have to be addressed once the owner can take a look.
…ucturization (#163942) This fixes the bug where deserializer would fail, with as assert, during the control flow structurization when a block to be removed still had uses. This indicates that the block that was sunk still is being referenced outside the region as the control flow is not structured. closes #163099
This PR seeks to improve the clarity of the Shard dialect documentation, in particular the descriptions of the communication operations.
…tInt based operands. (#159358) Simplifcation of vector.reduce intrinsics are prevented by an early bailout for ConstantInt base operands. This PR removes the bailout and updates the tests to show matching output when -use-constant-int-for-*-splat is used.
… in OpenMPToLLVMIRTranslation.cpp (NFC)
… outside the block If the copyable entry has the last instruction, used only outside the block, tha insert ion point for the vector code should be the last instruction itself, not the following one. It prevents wrong def-use sequences, which might be generated for the buildvector nodes. Fixes #163404
…Q/(V)PMULUDQ instructions (#163958)
HLSL extends C++'s requirement that unary `!` apply to boolean arguments and produces boolean results to apply to vectors. This change implements implicit conversion for non-boolean vector operands to boolean vectors. I've noted this behavior in the issue tracking writing the HLSL specification section on unary operators (microsoft/hlsl-specs#686). Fixes #162913 --------- Co-authored-by: Farzon Lotfi <[email protected]> Co-authored-by: Erich Keane <[email protected]>
Summary: The changes in #163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.
The following errors are seen in our builds using MSVC 2022: BitstreamRemarkParser.h(115): error C2990: 'llvm::remarks::BitstreamBlockParserHelper': non-class template has already been declared as a class template BitstreamRemarkParser.h(66): note: see declaration of 'llvm::remarks::BitstreamBlockParserHelper' This change fixes the build issue by adding an explicit template argument to the `friend class` statements. This issue is not seen if using `-std:c++20`, but we still support building as C++17.
…63914) This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]], introduced as part of C++17.
…]] (NFC) (#163915) This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]], introduced as part of C++17.
…63916) This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]], introduced as part of C++17.
…3917) This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]], introduced as part of C++17.
…r mul opcode (#163963) These now all use the same asm printout code
…163660) Implements genAllocate, genFree, and genCopy for FIR pointer types (fir.ref, fir.ptr, fir.heap, fir.llvm_ptr) in the OpenACC PointerLikeType interface. - genAllocate: Uses fir.alloca for stack types, fir.allocmem for heap types. Returns null for dynamic/unknown types (unlimited polymorphic, dynamic arrays, dynamic character lengths, box types). - genFree: Generates fir.freemem for heap allocations. Returns false if original allocation cannot be found. - genCopy: Uses fir.load+fir.store for trivial types (scalars), hlfir.assign for non-trivial types (arrays, derived types, characters). Returns false for unsupported dynamic types and box types. Adds comprehensive MLIR tests covering various FIR types and edge cases.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )