Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

No description provided.

@github-actions
Copy link

github-actions bot commented Oct 1, 2025

Review updated until commit 918e1b5

Description

  • Simplify MatmulOp output handling by removing redundant striding logic

  • Refactor resetAllocationDomainAndContiguity to use tensor directly

  • Update argument preparation to support expression evaluation outputs

  • Introduce inferAllocationSizesAndStrides without validation for flexibility


Changes walkthrough 📝

Relevant files
Bug fix
nodes.cpp
Simplify MatmulOp output handling                                               

csrc/ir/nodes.cpp

  • Remove manual stride handling in MatmulOp::evaluate
  • Directly return matmul_out without strided copy
  • +1/-11   
    Enhancement
    fusion_cache_utils.cpp
    Refactor allocation domain reset with tensor input             

    csrc/runtime/fusion_cache_utils.cpp

  • Update resetAllocationDomainAndContiguity to take at::Tensor instead
    of sizes/strides
  • Use inferAllocationSizesAndStrides to get sizes and strides
  • Simplify contiguity update logic with direct indexing
  • +15/-75 
    fusion_kernel_runtime.cpp
    Support expression evaluation in kernel preparation           

    csrc/runtime/fusion_kernel_runtime.cpp

  • Add expression evaluation path for ExprEval scheduler
  • Bind inputs and evaluate outputs using ExpressionEvaluator
  • Pass is_expr_eval flag to updateWithSegmentOutputs
  • +25/-7   
    tensor_metadata.cpp
    Split inference and validation of allocation sizes             

    csrc/tensor_metadata.cpp

  • Rename inferAndValidateAllocationSizesAndStrides to
    inferAllocationSizesAndStrides
  • Move validation to separate function call
  • Add new inferAndValidate wrapper that calls infer function
  • +12/-1   
    Documentation
    tensor_metadata.h
    Declare allocation sizes inference functions                         

    csrc/tensor_metadata.h

  • Add new inferAllocationSizesAndStrides function declaration
  • Document purpose of non-validating inference function
  • Keep inferAndValidateAllocationSizesAndStrides for compatibility
  • +9/-0     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Output Striding

    The PR removes logic that ensures the output tensor has correct strides by constructing a strided tensor when the inferred output is not contiguous. This may lead to incorrect memory layout assumptions in downstream operations.

    return {matmul_out};
    Contiguity Update

    The refactored resetAllocationDomainAndContiguity function no longer handles mismatched allocation domains or updates contiguity in all cases, potentially leading to incorrect contiguity metadata on TensorView.

        const at::Tensor& tensor) {
      const auto [sizes, strides] = inferAllocationSizesAndStrides(tensor, tv, ExpressionEvaluator());
      auto contiguity_without_reduction = computeContiguity(sizes, strides);
      std::vector<std::optional<bool>> contiguity;
      int64_t index = 0;
      for (auto id : tv->getMaybeAllocationDomain()) {
        if (id->isReduction()) {
          contiguity.push_back(std::nullopt);
        } else if (!id->isBroadcast() &&
            !contiguity_without_reduction[index].has_value()) {
          contiguity.push_back(true);
          index++;
        } else {
          contiguity.push_back(contiguity_without_reduction[index++]);
        }
      }
      tv->setContiguity(contiguity);
    }
    ExprEval Output Handling

    When using ExprEval, output sizes and strides are computed via evaluation, but the updated contiguity logic may not be consistently applied due to changes in how resetAllocationDomainAndContiguity is called.

    if (is_expr_eval) {
      // For expr evaluated fusion, the striding rules follow that of ATen.
      ExpressionEvaluator eval_fusion;
      for (auto [i, v] : enumerate(group_to_run->inputs())) {
        auto tensor_pv = args_manager.checkTensorMap(v);
        eval_fusion.bind(fusion_to_run->inputs()[i], tensor_pv);
      }
      for (auto v : fusion_to_run->outputs()) {
        auto result = eval_fusion.evaluate(v);
        group_runtime_outputs.push(result);
      }

    @zasdfgbnm
    Copy link
    Collaborator Author

    !test

    @zasdfgbnm zasdfgbnm marked this pull request as ready for review October 9, 2025 00:01
    @zasdfgbnm zasdfgbnm merged commit 395927b into eval-segment Oct 9, 2025
    7 of 9 checks passed
    @zasdfgbnm zasdfgbnm deleted the esdebug5 branch October 9, 2025 00:02
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants