Skip to content

NVIDIA GeForce RTX 5090机器上使用LLama-Factory加载AWQ量化模型报错。 #9718

@wangy2032

Description

@wangy2032

Reminder

  • I have read the above rules and searched the existing issues.

System Info

Name: torch Version: 2.9.0+cu130; Name: transformers Version: 4.57.3; Name: autoawq Version: 0.2.9; Name: llmcompressor Version: 0.9.0

Reproduction

[INFO|2026-01-05 09:34:30] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
[WARNING|logging.py:328] 2026-01-05 09:34:30,985 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|auto.py:242] 2026-01-05 09:34:30,986 >>
[WARNING|quantizer_awq.py:102] 2026-01-05 09:34:30,986 >> `torch.bfloat16` is not supported for AWQ CUDA/XPU kernels yet. Casting to `torch.float16`.
[INFO|modeling_utils.py:1169] 2026-01-05 09:34:30,986 >> loading weights file /app/06-model/qwen3-8b-int4-awq/model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2026-01-05 09:34:30,986 >> Instantiating Qwen3ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:986] 2026-01-05 09:34:30,988 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

/opt/conda/lib/python3.11/site-packages/awq/__init__.py:21: DeprecationWarning:
I have left this message as the final dev message to help you transition.

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/

  warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/app/src/llamafactory/launcher.py", line 185, in <module>
[rank0]:     run_exp()
[rank0]:   File "/app/src/llamafactory/train/tuner.py", line 132, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/app/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/app/src/llamafactory/train/sft/workflow.py", line 53, in run_sft
[rank0]:     model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/app/src/llamafactory/model/loader.py", line 179, in load_model
[rank0]:     model = load_class.from_pretrained(**init_kwargs)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4998, in from_pretrained
[rank0]:     hf_quantizer.preprocess_model(
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
[rank0]:     return self._process_model_before_weight_loading(model, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/quantizer_awq.py", line 119, in _process_model_before_weight_loading
[rank0]:     model, has_been_replaced = replace_with_awq_linear(
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
[rank0]:     from awq.modules.linear.gemm import WQLinear_GEMM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/__init__.py", line 24, in <module>
[rank0]:     from awq.models.auto import AutoAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/__init__.py", line 1, in <module>
[rank0]:     from .mpt import MptAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/mpt.py", line 1, in <module>
[rank0]:     from .base import BaseAWQForCausalLM
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/base.py", line 49, in <module>
[rank0]:     from awq.quantize.quantizer import AwqQuantizer
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/quantizer.py", line 11, in <module>
[rank0]:     from awq.quantize.scale import apply_scale, apply_clip
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/scale.py", line 12, in <module>
[rank0]:     from transformers.activations import NewGELUActivation, PytorchGELUTanh, GELUActivation
[rank0]: ImportError: cannot import name 'PytorchGELUTanh' from 'transformers.activations' (/opt/conda/lib/python3.11/site-packages/transformers/activations.py)
`torch_dtype` is deprecated! Use `dtype` instead!
`torch.bfloat16` is not supported for AWQ CUDA/XPU kernels yet. Casting to `torch.float16`.
/opt/conda/lib/python3.11/site-packages/awq/__init__.py:21: DeprecationWarning:
I have left this message as the final dev message to help you transition.

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/

  warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)
[rank1]: Traceback (most recent call last):
[rank1]:   File "/app/src/llamafactory/launcher.py", line 185, in <module>
[rank1]:     run_exp()
[rank1]:   File "/app/src/llamafactory/train/tuner.py", line 132, in run_exp
[rank1]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank1]:   File "/app/src/llamafactory/train/tuner.py", line 93, in _training_function
[rank1]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]:   File "/app/src/llamafactory/train/sft/workflow.py", line 53, in run_sft
[rank1]:     model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/app/src/llamafactory/model/loader.py", line 179, in load_model
[rank1]:     model = load_class.from_pretrained(**init_kwargs)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
[rank1]:     return model_class.from_pretrained(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4998, in from_pretrained
[rank1]:     hf_quantizer.preprocess_model(
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
[rank1]:     return self._process_model_before_weight_loading(model, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/quantizers/quantizer_awq.py", line 119, in _process_model_before_weight_loading
[rank1]:     model, has_been_replaced = replace_with_awq_linear(
[rank1]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
[rank1]:     from awq.modules.linear.gemm import WQLinear_GEMM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/__init__.py", line 24, in <module>
[rank1]:     from awq.models.auto import AutoAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/__init__.py", line 1, in <module>
[rank1]:     from .mpt import MptAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/mpt.py", line 1, in <module>
[rank1]:     from .base import BaseAWQForCausalLM
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/models/base.py", line 49, in <module>
[rank1]:     from awq.quantize.quantizer import AwqQuantizer
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/quantizer.py", line 11, in <module>
[rank1]:     from awq.quantize.scale import apply_scale, apply_clip
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/awq/quantize/scale.py", line 12, in <module>
[rank1]:     from transformers.activations import NewGELUActivation, PytorchGELUTanh, GELUActivation
[rank1]: ImportError: cannot import name 'PytorchGELUTanh' from 'transformers.activations' (/opt/conda/lib/python3.11/site-packages/transformers/activations.py)
[rank0]:[W105 09:34:31.743115696 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W105 09:34:32.336205576 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
W0105 09:34:32.395000 366471 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 366491 closing signal SIGTERM
E0105 09:34:32.559000 366471 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 0 (pid: 366490) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
    run(args)
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
    elastic_launch(
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/app/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2026-01-05_09:34:32
  host      : 68d0f41c5a5f
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 366490)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
[W105 09:34:32.911273327 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
Traceback (most recent call last):
  File "/opt/conda/bin/llamafactory-cli", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/src/llamafactory/cli.py", line 24, in main
    launcher.launch()
  File "/app/src/llamafactory/launcher.py", line 115, in launch
    process = subprocess.run(
              ^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['torchrun', '--nnodes', '1', '--node_rank', '0', '--nproc_per_node', '2', '--master_addr', '127.0.0.1', '--master_port', '60195', '/app/src/llamafactory/launcher.py', '/app/self_made_train_yaml/gpu_dir/qwen3_8b_int4_awq_sft_train_v1_20260105.yaml']' returned non-zero exit status 1.
[W105 09:34:33.485959237 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions