Skip to content

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Oct 3, 2025

This was an oversight when the greedy scheduler was extended with batching. The uninlinable IDs need to be loop IDs, whereas before this PR they stayed at the logical domain. Generated code didn't result in any error because the loop IDs are either parallelized with TIDx or Group, but still limiting the inlining position to the left of constrained IDs should make more sense.

@naoyam
Copy link
Collaborator Author

naoyam commented Oct 3, 2025

!test

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

Description

  • Fix inlining logic for constrained ops in loop domain

  • Prevent inlining into constrained axes like argsort, scan, scatter

  • Update uninlinable IDs to use loop domain instead of logical domain

  • Add validation tests for computeAt positions in constrained ops


Changes walkthrough 📝

Relevant files
Bug fix
greedy.cpp
Update inlining restriction to loop domain                             

csrc/scheduler/greedy.cpp

  • Remove old inlining restriction on logical domain IDs
  • Introduce dependency-based detection of constrained loop IDs
  • Mark constrained loop domain axes as uninlinable
  • Use getAllValsBetween to capture transitive dependencies
  • +19/-8   
    Tests
    test_greedy.cpp
    Add tests for constrained op scheduling                                   

    tests/cpp/test_greedy.cpp

  • Add post-scheduling validation for argsort loop structure
  • Verify computeAt positions for argsort, scan, scatter ops
  • Check parallelization axes (BIDx, TIDx, Group) in loop domain
  • Confirm inputs are not inlined into constrained dimensions
  • +91/-0   

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The logic for identifying constrained loop IDs and marking them as uninlinable may not correctly handle all dependency paths, especially when there are indirect dependencies in the loop domain. The use of DependencyCheck::getAllValsBetween should be validated to ensure it captures all relevant loop IDs that depend on constrained logical IDs.

    // Don't inline constrained IDs. For example, like reduction IDs,
    // argsort'ed IDs should never be inlined into its consumers.
    std::unordered_set<Val*> constrained_logical;
    for (const auto constrained_logical_id_offset :
         constrained_logical_id_offsets) {
      constrained_logical.insert(
          tv->getLogicalDomain().at(constrained_logical_id_offset));
    }
    
    auto all_constrained_ids = DependencyCheck::getAllValsBetween(
        constrained_logical,
        {tv->getLoopDomain().begin(), tv->getLoopDomain().end()});
    for (const auto loop_id : tv->getLoopDomain()) {
      if (std::ranges::find(all_constrained_ids, loop_id) !=
          all_constrained_ids.end()) {
        uninlinable_ids_.insert(loop_id);
      }
    }
    Performance Impact

    Inserting each loop ID into uninlinable_ids_ individually could lead to inefficiencies if the loop domain is large. Consider whether a bulk insertion or a more optimized lookup structure is needed to maintain performance scalability.

    for (const auto loop_id : tv->getLoopDomain()) {
      if (std::ranges::find(all_constrained_ids, loop_id) !=
          all_constrained_ids.end()) {
        uninlinable_ids_.insert(loop_id);
      }
    }

    {tv->getLoopDomain().begin(), tv->getLoopDomain().end()});
    for (const auto loop_id : tv->getLoopDomain()) {
    if (std::ranges::find(all_constrained_ids, loop_id) !=
    all_constrained_ids.end()) {
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Why do we need to exclude constrained_logical

    For a manual scheduling, we could have logical domain and loop domain share IDs and this would artificially exclude that.

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Can you clarify your question? Exclude from inlining? Or exclude from uninlinable_ids?

    They should be included in all_constrained_ids, so any loop ID, no matter if it's also a logical ID, should be included in uninlinable_ids. So, no matter if it's logical or not, all constrained loop IDs are excluded from inlining.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Ah you are right. For some reason I read getAllValsBetween and figured that the dependencies wouldn't be included.

    268   // Grab all values that exist between and including provided
    269   // vals. Returned values are topologicaly ordered, and unique.
    270   NVF_API static std::vector<Val*> getAllValsBetween(                                                             
    271       const std::unordered_set<Val*>& dependencies,
    272       const std::vector<Val*>& of);
    

    {tv->getLoopDomain().begin(), tv->getLoopDomain().end()});
    for (const auto loop_id : tv->getLoopDomain()) {
    if (std::ranges::find(all_constrained_ids, loop_id) !=
    all_constrained_ids.end()) {
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Ah you are right. For some reason I read getAllValsBetween and figured that the dependencies wouldn't be included.

    268   // Grab all values that exist between and including provided
    269   // vals. Returned values are topologicaly ordered, and unique.
    270   NVF_API static std::vector<Val*> getAllValsBetween(                                                             
    271       const std::unordered_set<Val*>& dependencies,
    272       const std::vector<Val*>& of);
    

    @naoyam naoyam merged commit 1a8f337 into main Oct 13, 2025
    54 of 55 checks passed
    @naoyam naoyam deleted the greedy_inlining branch October 13, 2025 20:22
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants