Skip to content

pre-commit: PR165159#3520

Open
zyw-bot wants to merge 3 commits intomainfrom
test-run22500182292
Open

pre-commit: PR165159#3520
zyw-bot wants to merge 3 commits intomainfrom
test-run22500182292

Conversation

@zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Feb 27, 2026

Link: llvm/llvm-project#165159
Requested by: @yxsamliu

@github-actions github-actions bot mentioned this pull request Feb 27, 2026
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Feb 27, 2026

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@c3b3f41
patch: llvm/llvm-project#165159
sha256: 830c900cbc232ae3eca1e3170ef8c9aa6e86661b1f8973e31288ab07ba30a540
commit: 6087fae

112 files changed, 126321 insertions(+), 127205 deletions(-)

Improvements:
  sroa.NumLoadsPredicated 14459 -> 14489 +0.21%
  sroa.NumStoresPredicated 3634 -> 3640 +0.17%
  instcount.NumExtractElementInst 55343 -> 55388 +0.08%
  sroa.NumLoadsSpeculated 316530 -> 316600 +0.02%
  loop-idiom.NumMemSet 38911 -> 38917 +0.02%
  instcount.NumInsertElementInst 90566 -> 90576 +0.01%
  memory-builtins.ObjectVisitorLoad 23200 -> 23202 +0.01%
  attributor.NumAAs 3940676 -> 3940956 +0.01%
  loop-delete.NumDeleted 112114 -> 112121 +0.01%
  memdep.NumCacheCompleteNonLocalPtr 5302316 -> 5302550 +0.00%
Regressions:
  memcpyopt.NumCpyToSet 11953 -> 11944 -0.08%
  instcombine.NumDeadStore 25573 -> 25564 -0.04%
  correlated-value-propagation.NumNonNull 10849100 -> 10847560 -0.01%
  memdep.NumCacheDirtyNonLocalPtr 23019 -> 23017 -0.01%
  instcount.NumAllocaInst 5811514 -> 5811232 -0.00%
  capture-tracking.NumNotCapturedBefore 19317429 -> 19316773 -0.00%
  instcount.NumCallInst 38921359 -> 38920222 -0.00%
  memcpyopt.NumCallSlot 1011065 -> 1011044 -0.00%
  sroa.NumAllocaPartitionUses 266600647 -> 266595399 -0.00%
  instcount.NumSelectInst 1779718 -> 1779683 -0.00%

+3 cpython/compile.ll
+3 xgboost/updater_refresh.ll
+1 ffmpeg/avformat.ll
+0 assimp/FBXConverter.ll
+0 box2d/sample_collision.ll
+0 ceres/gradient_problem_solver.ll
+0 ceres/line_search.ll
+0 ffmpeg/ffmpeg_dec.ll
+0 gromacs/colvarparse.ll
+0 opencv/benchmark.ll
+0 opencv/binarizer.ll
+0 opencv/graphsegmentation.ll
+0 opencv/seam_finders.ll
+0 openusd/blendShapeQuery.ll
+0 z3/euf_proof_checker.ll
-1 delta-rs/11f8x98axanecwnw.ll
-2 image-rs/1clnprdgqfw2q9lq.ll
-2 z3/seq_axioms.ll
-3 bullet3/b3DynamicBvhBroadphase.ll
-3 bullet3/btConvexHullComputer.ll
-3 cmake/session.ll
-3 gromacs/lincs.ll
-3 wireshark/sparkline_delegate.ll
-4 hyperscan/rose_build_bytecode.ll
-4 llvm/AArch64O0PreLegalizerCombiner.ll
-4 llvm/AttributorAttributes.ll
-4 llvm/OMPIRBuilder.ll
-4 php/dirstream.ll
-6 duckdb/ub_duckdb_storage_metadata.ll
-7 freetype/ftbase.ll
-8 llvm/AArch64InstructionSelector.ll
-9 open3d/EstimateNormals.ll
-9 opencv/erfilter.ll
-9 opencv/gapi_core_perf_tests.ll
-9 opencv/gnnparsers.ll
-9 velox/GreatestLeast.ll
-12 hermes/Exceptions.ll
-12 openusd/collectionCache.ll
-12 wasmtime-rs/16qf4j2oevjc61uc.ll
-14 llvm/FunctionAttrs.ll
-15 xgboost/updater_approx.ll
-24 velox/ArraySort.ll

@github-actions
Copy link
Contributor

Here is a concise summary of the major changes in this LLVM IR diff:

  1. Vectorization of Small Struct Allocations and Loads/Stores:
    Multiple instances replace { float, float } or { i64, i64 } struct allocations with vector types (<2 x float>, <2 x i64>, <4 x i32>, etc.), accompanied by corresponding load/store instructions instead of alloca + memcpy. This reflects improved SROA (Scalar Replacement of Aggregates) and vectorization, especially for 2-field structs representing geometric data (e.g., colors, vectors, points).

  2. Elimination of Temporary Alloca + memcpy Patterns:
    Code patterns using a temporary stack allocation (alloca) followed by memcpy to swap or copy small aggregates (e.g., b2Vec2, FBX::Light::Color, MetadataBlockInfo) are replaced with direct vector loads/stores and phi nodes. This removes unnecessary memory traffic and lifetime management (llvm.lifetime.start/end calls are removed).

  3. Improved Handling of std::function Move/Assignment:
    In several LLVM and C++ standard library modules (e.g., SmallVectorImpl<std::function>), move operations now use vector loads/stores (<2 x i64>) for the std::function’s internal storage instead of byte-wise memcpy. This includes aligning allocas to 16 bytes and updating pointer arithmetic accordingly.

  4. Refinement of Sorting and Swapping Logic:
    Sorting routines (e.g., __introsort_loop, __insertion_sort, __unguarded_linear_insert) across multiple benchmarks (hermes, duckdb, opencv, open3d) eliminate temporary alloca-based swap buffers in favor of direct vector loads/stores into loop-carried values, reducing stack usage and improving locality.

  5. Cleanup of Redundant Struct Definitions and Phi Node Updates:
    Minor but consistent cleanups include removing unused struct type declarations (e.g., PyCompilerFlags in cpython), replacing i64-based loads/stores of struct fields with vector equivalents, and fixing phi node operand order to match updated basic block predecessors—ensuring correctness after control-flow restructuring.

These changes collectively reflect aggressive SROA, better vector type inference, and more precise memory access modeling—leading to reduced stack allocations, eliminated redundant copies, and improved code generation for small aggregate data.

model: qwen-plus-latest
CompletionUsage(completion_tokens=532, prompt_tokens=109022, total_tokens=109554, completion_tokens_details=None, prompt_tokens_details=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants