NVIDIA GeForce RTX 5090机器上使用LLama-Factory加载AWQ量化模型报错。

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

Name: torch  Version: 2.9.0+cu130； Name: transformers   Version: 4.57.3； Name: autoawq   Version: 0.2.9； Name: llmcompressor   Version: 0.9.0


### Reproduction

```[INFO|2026-01-05 09:34:30] llamafactory.model.model_utils.quantization:143 >> Loading 4-bit AWQ-quantized model.
[INFO|2026-01-05 09:34:30] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
[WARNING|logging.py:328] 2026-01-05 09:34:30,985 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|auto.py:242] 2026-01-05 09:34:30,986 >>
[WARNING|quantizer_awq.py:102] 2026-01-05 09:34:30,986 >> `torch.bfloat16` is not supported for AWQ CUDA/XPU kernels yet. Casting to `torch.float16`.
[INFO|modeling_utils.py:1169] 2026-01-05 09:34:30,986 >> loading weights file /app/06-model/qwen3-8b-int4-awq/model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2026-01-05 09:34:30,986 >> Instantiating Qwen3ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:986] 2026-01-05 09:34:30,988 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

/opt/conda/lib/python3.11/site-packages/awq/__init__.py:21: DeprecationWarning:
I have left this message as the final dev message to help you transition.

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/

  warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/app/src/llamafactory/launcher.py", line 185, in <module>
[rank0]:     run_exp()
[rank0]:   File "/app/src/llamafactory/train/tuner.py", line 132, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/app/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/app/src/llamafactory/train/sft/workflow.py", line 53, in run_sft
[rank0]:     model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/app/src/llamafactory/model/loader.py", line 179, in load_model
[rank0]:     model = load_class.from_pretrained(**init_kwargs)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4998, in from_pretrained
[rank0]:     hf_quantizer.preprocess_model(
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
[rank0]:     return self._process_model_before_weight_loading(model, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/quantizer_awq.py", line 119, in _process_model_before_weight_loading
[rank0]:     model, has_been_replaced = replace_with_awq_linear(
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
[rank0]:     from awq.modules.linear.gemm import WQLinear_GEMM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/__init__.py", line 24, in <module>
[rank0]:     from awq.models.auto import AutoAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/__init__.py", line 1, in <module>
[rank0]:     from .mpt import MptAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/mpt.py", line 1, in <module>
[rank0]:     from .base import BaseAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/base.py", line 49, in <module>
[rank0]:     from awq.quantize.quantizer import AwqQuantizer
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/quantizer.py", line 11, in <module>
[rank0]:     from awq.quantize.scale import apply_scale, apply_clip
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/scale.py", line 12, in <module>
[rank0]:     from transformers.activations import NewGELUActivation, PytorchGELUTanh, GELUActivation
[rank0]: ImportError: cannot import name 'PytorchGELUTanh' from 'transformers.activations' (/opt/conda/lib/python3.11/site-packages/transformers/activations.py)
`torch_dtype` is deprecated! Use `dtype` instead!
`torch.bfloat16` is not supported for AWQ CUDA/XPU kernels yet. Casting to `torch.float16`.
/opt/conda/lib/python3.11/site-packages/awq/__init__.py:21: DeprecationWarning:
I have left this message as the final dev message to help you transition.

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/

  warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)
[rank1]: Traceback (most recent call last):
[rank1]:   File "/app/src/llamafactory/launcher.py", line 185, in <module>
[rank1]:     run_exp()
[rank1]:   File "/app/src/llamafactory/train/tuner.py", line 132, in run_exp
[rank1]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank1]:   File "/app/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank1]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]:   File "/app/src/llamafactory/train/sft/workflow.py", line 53, in run_sft
[rank1]:     model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/app/src/llamafactory/model/loader.py", line 179, in load_model
[rank1]:     model = load_class.from_pretrained(**init_kwargs)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank1]:     return model_class.from_pretrained(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4998, in from_pretrained
[rank1]:     hf_quantizer.preprocess_model(
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
[rank1]:     return self._process_model_before_weight_loading(model, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/quantizer_awq.py", line 119, in _process_model_before_weight_loading
[rank1]:     model, has_been_replaced = replace_with_awq_linear(
[rank1]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
[rank1]:     from awq.modules.linear.gemm import WQLinear_GEMM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/__init__.py", line 24, in <module>
[rank1]:     from awq.models.auto import AutoAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/__init__.py", line 1, in <module>
[rank1]:     from .mpt import MptAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/mpt.py", line 1, in <module>
[rank1]:     from .base import BaseAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/base.py", line 49, in <module>
[rank1]:     from awq.quantize.quantizer import AwqQuantizer
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/quantizer.py", line 11, in <module>
[rank1]:     from awq.quantize.scale import apply_scale, apply_clip
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/scale.py", line 12, in <module>
[rank1]:     from transformers.activations import NewGELUActivation, PytorchGELUTanh, GELUActivation
[rank1]: ImportError: cannot import name 'PytorchGELUTanh' from 'transformers.activations' (/opt/conda/lib/python3.11/site-packages/transformers/activations.py)
[rank0]:[W105 09:34:31.743115696 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W105 09:34:32.336205576 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
W0105 09:34:32.395000 366471 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 366491 closing signal SIGTERM
E0105 09:34:32.559000 366471 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 0 (pid: 366490) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
    run(args)
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
    elastic_launch(
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/app/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2026-01-05_09:34:32
  host      : 68d0f41c5a5f
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 366490)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
[W105 09:34:32.911273327 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
Traceback (most recent call last):
  File "/opt/conda/bin/llamafactory-cli", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/src/llamafactory/cli.py", line 24, in main
    launcher.launch()
  File "/app/src/llamafactory/launcher.py", line 115, in launch
    process = subprocess.run(
              ^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['torchrun', '--nnodes', '1', '--node_rank', '0', '--nproc_per_node', '2', '--master_addr', '127.0.0.1', '--master_port', '60195', '/app/src/llamafactory/launcher.py', '/app/self_made_train_yaml/gpu_dir/qwen3_8b_int4_awq_sft_train_v1_20260105.yaml']' returned non-zero exit status 1.
[W105 09:34:33.485959237 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())

```



### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVIDIA GeForce RTX 5090机器上使用LLama-Factory加载AWQ量化模型报错。 #9718

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

NVIDIA GeForce RTX 5090机器上使用LLama-Factory加载AWQ量化模型报错。 #9718

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions