Skip to content
7 changes: 6 additions & 1 deletion python/nvfuser_direct/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,10 +366,15 @@ def execute(
# A copy of fusion is created after construction FusionExecutorCache
# Delete the _fusion and reference the fusion inside FusionExecutorCache
del self._fusion

# Add "id_model" as a default enable option
default_enable_options = ["id_model"]
merged_enable_options = default_enable_options + _enable_options
Comment on lines +370 to +372
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Logic Issue: Adding "id_model" to enable options without arguments has no effect.

The IdModel system requires explicit sub-options to be passed as arguments (e.g., "all", "index", "consumer_index", "producer_index", etc.). These are checked via hasEnableOptionArgument(EnableOption::IdModel, "arg") in csrc/id_model/utils.h:30-80.

When you add "id_model" here without arguments, the Python bindings call EnableOptionsGuard::getCurOptions().set(opt.value()) with an empty vector (see python/python_direct/runtime.cpp:284). This means no IdModel features are actually enabled.

Evidence from codebase:

  1. C++ tests use: EnableOption::IdModel, {"all"} (tests/cpp/utils.cpp)
  2. Python tests that need IdModel use: set_env(NVFUSER_ENABLE="id_model(all)") (tests/python/direct/test_with_id_model_indexer.py:183)

Proposed solution:
The Python _enable_options API doesn't support passing arguments. You need to either:

  1. Modify the Python bindings to support syntax like ["id_model:all"] or similar
  2. Use environment variable NVFUSER_ENABLE="id_model(all)" in test setup
  3. Add a new Python API parameter specifically for IdModel options

Without this fix, TensorIndexer will NOT be enabled with python_direct tests as the PR title claims.

Comment on lines +371 to +372
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Potential duplicate options if user passes "id_model" in _enable_options. Consider checking if "id_model" is already in _enable_options before adding it, or use a set to avoid duplicates. While this doesn't cause functional issues (the second set() call just overwrites), it's cleaner to avoid duplicates.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +370 to +372
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] No opt-out mechanism provided. Users cannot disable id_model when using execute() even if they want to. Consider either: (1) checking if "id_model" is in _disable_options and skipping the default, or (2) providing a parameter to control default options. This limits flexibility for users who may need to test without id_model or work around potential id_model bugs.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +370 to +372
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "id_model" as a default enable option means users cannot opt-out of this behavior through the Python API. Consider whether this is intentional or if there should be a mechanism to allow users to disable id_model for testing/debugging purposes.

Potential implications:

  • Users performing A/B testing between old and new indexing cannot easily do so
  • Debugging issues specific to id_model becomes harder without a way to disable it
  • This is a behavioral change that affects all python_direct users

If this is intentional (to force id_model adoption), consider documenting this breaking change clearly. If not, consider checking if "id_model" is already in _enable_options or providing an opt-out mechanism.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


return self.fec.execute(
inputs,
device=self._get_device_index(device),
_enable_options=_enable_options,
_enable_options=merged_enable_options,
_disable_options=_disable_options,
)

Expand Down
5 changes: 3 additions & 2 deletions tests/cpp/test_indexing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -864,8 +864,9 @@ TEST_F(IndexingTest, Reshape) {
// to provide the extent of the group. However, since everything
// should be deterministic, string match should also work.
return std::string(
"( ( ( ( ( i98 * 20 ) + ( ( i99 * 10 ) + i100 ) ) / 25 ) * 25 ) "
"+ ( ( ( i98 * 20 ) + ( ( i99 * 10 ) + i100 ) ) % 25 ) )");
"( ( ( ( ( i114 * 20 ) + ( ( i115 * 10 ) + i116 ) ) / 25 ) * 25 "
") "
"+ ( ( ( i114 * 20 ) + ( ( i115 * 10 ) + i116 ) ) % 25 ) )");
}
default:
return std::string();
Expand Down
10 changes: 4 additions & 6 deletions tests/python/direct/test_python_direct.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,22 +229,20 @@ def test_fusion_execution_cache():
i2 = i0 / 8;
nvfuser_index_t i3;
i3 = i0 % 8;
nvfuser_index_t i4;
i4 = ((nvfuser_index_t)threadIdx.x) + (128 * ((nvfuser_index_t)blockIdx.x));
if ((i4 < 64)) {
if ((((nvfuser_index_t)threadIdx.x) < 64)) {
Array<float, 1, 1> T4;
T4[0] = 0;
T4[0]
= T1[((((T1.alloc_stride[0LL] * i1) + (T1.alloc_stride[1LL] * i2)) + (T1.alloc_stride[2LL] * i3)) + ((4 * T1.alloc_stride[0LL]) * ((nvfuser_index_t)blockIdx.x)))];
= T1[(((T1.alloc_stride[0LL] * i1) + (T1.alloc_stride[1LL] * i2)) + (T1.alloc_stride[2LL] * i3))];
Array<float, 1, 1> T3;
T3[0] = 0;
T3[0]
= T0[((((T0.alloc_stride[0LL] * i1) + (T0.alloc_stride[1LL] * i2)) + (T0.alloc_stride[2LL] * i3)) + ((4 * T0.alloc_stride[0LL]) * ((nvfuser_index_t)blockIdx.x)))];
= T0[(((T0.alloc_stride[0LL] * i1) + (T0.alloc_stride[1LL] * i2)) + (T0.alloc_stride[2LL] * i3))];
Array<float, 1, 1> T5;
T5[0]
= T3[0]
+ T4[0];
T2[i4]
T2[((nvfuser_index_t)threadIdx.x)]
= T5[0];
}
}\n"""
Expand Down