Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用官方的Qwen-xxB-Chat-Int4转TRT,都用greedy sereach,TRT和torch的结果不一致正常吗 #57

Open
byjswr opened this issue Dec 29, 2023 · 9 comments

Comments

@byjswr
Copy link

byjswr commented Dec 29, 2023

使用官方的Qwen-xxB-Chat-Int4转TRT,都用greedy sereach,TRT和torch的结果不一致正常吗
python build.py --hf_model_dir Qwen-7B-Chat-Int4/
--quant_ckpt_path Qwen-7B-Chat-Int4/
--dtype float16
--remove_input_padding
--use_gpt_attention_plugin float16
--enable_context_fmha
--use_gemm_plugin float16
--use_weight_only
--weight_only_precision int4_gptq
--per_group
--world_size 1
--tp_size 1
--output_dir models/7B-int4/1_fp16-gpu

@Tlntin
Copy link
Owner

Tlntin commented Dec 29, 2023

或许正常,可以给一个案例说明一下,可能是推理参数不一样导致的。

@byjswr
Copy link
Author

byjswr commented Dec 30, 2023

请问一下,我用了8张40G的A100 将72B的模型转fp16,rotary_base=1000000,max_input_len=12000,max_output_le=2048,debug看了一下gpt_attention后的结果与torch的fp16误差较大有可能是什么情况

@byjswr
Copy link
Author

byjswr commented Dec 30, 2023

用register_network_output方式打印的,gpt_attention之前的值是一模一样的,但是过了gpt_attention就变得有点大了。
企业微信截图_17039285234089

@Tlntin
Copy link
Owner

Tlntin commented Dec 31, 2023

Debug的时候检查一下Attention的 seq_length是否传对了,应该是32k。

@byjswr
Copy link
Author

byjswr commented Dec 31, 2023

我用的好像就是seq_length = 32768,
企业微信截图_17039883174128

@byjswr
Copy link
Author

byjswr commented Dec 31, 2023

补充一下,7B的我测试了,转fp16是一致的,里面的value和最终输出的结果和torchfp16保持一致

@byjswr
Copy link
Author

byjswr commented Dec 31, 2023

是不是7B和72B的gpt_attention部分有些许的不同导致的

@Tlntin
Copy link
Owner

Tlntin commented Dec 31, 2023

我用的好像就是seq_length = 32768,
企业微信截图_17039883174128

Debug的时候看看attention那里传入的参数配置看看是否正常。

@Hukongtao
Copy link

用register_network_output方式打印的,gpt_attention之前的值是一模一样的,但是过了gpt_attention就变得有点大了。 企业微信截图_17039285234089

Is your problem solved? Qwen 1.8B model also has this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants