Skip to content

"torch_compile" still depends on Triton functionality #41

@Vectorrent

Description

@Vectorrent

I attempted to use linear_cross_entropy with my GTX 1070 (old, I know), but it fails with the following errors. These errors were unexpected, because the description for "torch_compile" mode seems to suggest that Triton support is not necessary:

"torch_compile = A highly optimized torch.compile implementation. This is typically the fastest but uses the most amount of memory. Good as a reference and for systems that don't support Triton."

I can run the same code on CPU, without any issues. It is only when I try to use my GPU that I get the following errors.

Just reporting this issue for posterity. I'm not really expecting a solution, but if there is an easy fix... I would greatly appreciate one!

  File "/home/crow/repos/praxis/praxis/losses/cut_cross_entropy.py", line 15, in forward
    loss = linear_cross_entropy(
        embeddings,
    ...<5 lines>...
        reduction="mean",
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/cut_cross_entropy/linear_cross_entropy.py", line 104, in linear_cross_entropy
    return torch_compile_linear_cross_entropy(
        e, c, targets, bias, ignore_index, softcap, reduction, shift
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/cut_cross_entropy/torch_compile.py", line 66, in torch_compile_linear_cross_entropy
    loss = torch_compile_linear_cross_entropy_apply(
        e,
    ...<5 lines>...
        reduction=reduction,
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
    return fn(*args, **kwargs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__
    return self._torchdynamo_orig_callable(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        frame, cache_entry, self.hooks, frame_state, skip=1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__
    return _compile(
        frame.f_code,
    ...<14 lines>...
        skip=skip + 1,
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_utils_internal.py", line 95, in wrapper_function
    return function(*args, **kwargs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
    transformations(instructions, code_options)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn
    return fn(*args, **kwargs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform
    tracer.run()
    ~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run
    super().run()
    ~~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run
    while self.step():
          ~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step
    self.dispatch_table[inst.opcode](self, inst)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE
    self._return(inst)
    ~~~~~~~~~~~~^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return
    self.output.compile_subgraph(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self,
        ^^^^^
    ...<2 lines>...
        ),
        ^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph
    self.compile_and_call_fx_graph(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        tx, list(reversed(stack_values)), root, output_replacements
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler
    return self._call_user_compiler(gm)
           ~~~~~~~~~~~~~~~~~~~~~~~~^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
        e.__traceback__
    ) from None
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/__init__.py", line 2340, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx
    return aot_autograd(
           ~~~~~~~~~~~~~
    ...<6 lines>...
        cudagraphs=cudagraphs,
        ~~~~~~~~~~~~~~~~~~~~~~
    )(model_, example_inputs_)
    ~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        functional_call,
        ^^^^^^^^^^^^^^^^
    ...<3 lines>...
        shape_env,
        ^^^^^^^^^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
        flat_fn, fake_flat_args, aot_config, fake_mode, shape_env
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ~~~~~~~~~~~^
        flat_fn,
        ^^^^^^^^
    ...<2 lines>...
        fw_metadata=fw_metadata,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 678, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__
    return self.compiler_fn(gm, example_inputs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base
    return inner_compile(
        gm,
    ...<5 lines>...
        boxed_forward_device_index=forward_device,
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        gm,
        ^^^
        example_inputs,
        ^^^^^^^^^^^^^^^
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
        gm, example_inputs, inputs_to_check, **graph_kwargs
    )
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
    compiled_fn = graph.compile_to_module().call
                  ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module
    return self._compile_to_module()
           ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/graph.py", line 2033, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
                                                             ~~~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/graph.py", line 1964, in codegen
    self.scheduler = Scheduler(self.operations)
                     ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 1798, in __init__
    self._init(nodes)
    ~~~~~~~~~~^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 1816, in _init
    self.nodes = [self.create_scheduler_node(n) for n in nodes]
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 1947, in create_scheduler_node
    return SchedulerNode(self, node)
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 893, in __init__
    self._compute_attrs()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 907, in _compute_attrs
    group_fn = self.scheduler.get_backend(device).group_fn
               ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 3441, in get_backend
    self.backends[device] = self.create_backend(device)
                            ~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/home/crow/repos/praxis/.venv/lib/python3.13/site-packages/torch/_inductor/scheduler.py", line 3428, in create_backend
    raise RuntimeError(
        f"Found {device_props.name} which is too old to be supported by the triton GPU compiler, which is used as the backend. Triton only supports devices of CUDA Capability >= 7.0, but your device is of CUDA capability {device_props.major}.{device_props.minor}"  # noqa: B950
    )
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Found NVIDIA GeForce GTX 1070 which is too old to be supported by the triton GPU compiler, which is used as the backend. Triton only supports devices of CUDA Capability >= 7.0, but your device is of CUDA capability 6.1

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions