Skip to content

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Sep 18, 2025

Stacked PRs

Breaking original PR #5170 into three:
#5186 Fix allocation logic: non-divisible split <- this one
#5185 Fix allocation logic: unconnected alloc/logical
#5184 Allow non-device split on allocation domain

This PR

Fixing project from allocation to logical for non-divisible split

This PR changes the projection to ensure extent for non-divisible split on backward projection maintains the correct extent, instead of including padded sizes from the two split IDs.

e.g. with ID i0 (extent 5), if we do split by 4, we'll get two new ID

    i0 (extent 5)
   /  \
 i0/4  4

Because i0/4 is a ceilDiv, it will ended up having extent 2.
When we project the extent from i0/4 and 4 back, the combined extent will be 8, instead of 5.

When we allocate buffer, we want the buffer to use allocation sizes (buffer size should be 8). But the tensor we produce should have size that's faithful to its logical domain sizes (tensor size should be 5). Otherwise, later consumer of the tensor would assert on mismatch tensor sizes.

TODO

  • Naoya requested more unit tests in comment

@github-actions
Copy link

github-actions bot commented Sep 18, 2025

Review updated until commit 89b2f1c

Description

  • Fix extent projection for non-divisible splits

  • Ensure logical tensor sizes remain correct

  • Add test validation for logical sizes

  • Fix typo in test case name


Changes walkthrough 📝

Relevant files
Bug fix
allocations.cpp
Fix tensor shape and stride computation                                   

csrc/runtime/allocations.cpp

  • Moved prod *= tensor_new_shape[i] after stride assignment to compute
    correct cumulative product
  • Added comment explaining ceilDiv padding in split operations
  • Sliced tensor to input extent when extent mismatch occurs during
    backward projection
  • Updated shape in as_strided call to use correct input extent for
    non-divisible splits
  • +12/-1   
    Tests
    test_layout_op.cpp
    Improve logical size correctness test                                       

    tests/cpp/test_layout_op.cpp

  • Reenabled test asserting output tensor logical size
  • Added equality check between input and output tensors
  • Removed outdated comment about slicing workaround
  • Verified strides reflect padding via outer dimension
  • +4/-7     
    Miscellaneous
    test_low_precision_recipe.cpp
    Fix test name typo                                                                             

    tests/cpp/test_low_precision_recipe.cpp

    • Fixed typo in test name: 'Ouput' → 'Output'
    +1/-1     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The order of operations in the stride computation loop was changed, which may affect the correctness of the computed strides. The original code updated the product after setting the stride, while the new code updates it before. This could lead to incorrect stride values.

    for (int i = static_cast<int>(tensor_new_shape.size()) - 1; i >= 0; --i) {
      tensor_new_strides[i] = prod;
      prod *= tensor_new_shape[i];
    Logic Gap

    The slicing logic for non-divisible splits is applied in both the contiguous and non-contiguous cases, but the extent used for slicing in the non-contiguous case comes from the evaluated input extent. It should be verified that this value is correctly propagated and used in all scenarios, especially when the tensor has complex layout transformations.

    int64_t in_extent = ee_.evaluate(in->extent()).as<int64_t>();
    if (areDimsToBeMergedContiguous(tensor_, new_shape)) {
      tensor_ = tensor_.view(new_shape);
      // slice for non-divisible split
      if (in_extent != tensor_.size(left)) {
        tensor_ = tensor_.slice(left, 0, in_extent);
      }
    } else {
      auto [tensor_new_shape, tensor_new_strides] =
          getShapeAndStrideAfterDimMerged(tensor_, new_shape);
      // slice for non-divisible split
      tensor_new_shape[left] = in_extent;

    @jjsjann123 jjsjann123 changed the title Fixing non-divisible split handling in allocation logic Fix non-divisible split handling in allocation logic Sep 18, 2025
    @jjsjann123 jjsjann123 changed the title Fix non-divisible split handling in allocation logic Fix allocation logic: non-divisible split Sep 18, 2025
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_for_layout_op_PR_1 branch from c64d299 to 33d0ce3 Compare September 18, 2025 21:40
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_non_divisible_split_PR_2 branch from 8fe76bd to 82fdd9b Compare September 18, 2025 21:40
    @jjsjann123
    Copy link
    Collaborator Author

    I'm keeping this as draft for now until I add more tests per @naoyam 's request.

    @jjsjann123 jjsjann123 force-pushed the jj/allocation_non_divisible_split_PR_2 branch from 82fdd9b to beab9a3 Compare September 19, 2025 22:54
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_for_layout_op_PR_1 branch from 33d0ce3 to 17df15a Compare September 19, 2025 22:55
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_for_layout_op_PR_1 branch from 17df15a to f9acfc3 Compare September 22, 2025 21:50
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_non_divisible_split_PR_2 branch from beab9a3 to 30c1640 Compare September 22, 2025 21:52
    1. refactor buffer allocation buffer to use allocation domain, intead of logical domain.
    2. fixing projection from allocation to logical special path when projection is not possible:
       We now compute correct extent instead of returning the allocation buffer as-is, this allows that layout op to return a tensor with the correct logical size, while still allocating a large enough buffer to accommodate the padding requirement.
    This PR changes the projection to ensure extent for non-divisible split on backward projection maintains the correct extent, instead of including padded sizes from the two split IDs.
    
    e.g. with ID i0 (extent 5), if we do split by 4, we'll get two new ID
    
        i0 (extent 5)
       /  \
     i0/4  4
    
    Because i0/4 is a ceilDiv, it will ended up having extent 2.
    When we project the extent from i0/4 and 4 back, the combined extent will be 8, instead of 5.
    
    When we allocate buffer, we want the buffer to use codomain sizes (buffer size should be 5). But the tensor we produce should have size that's faithful to its logical domain sizes. Otherwise, later consumer of the tensor would assert on mismatch tensor sizes.
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_for_layout_op_PR_1 branch from f9acfc3 to 87afb60 Compare September 23, 2025 17:39
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_non_divisible_split_PR_2 branch from 30c1640 to 89b2f1c Compare September 23, 2025 17:40
    @jjsjann123 jjsjann123 force-pushed the jj/allocation_for_layout_op_PR_1 branch 2 times, most recently from c64d299 to ea3cd68 Compare October 1, 2025 20:17
    Base automatically changed from jj/allocation_for_layout_op_PR_1 to main October 7, 2025 23:41
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants