[Question]: 张量并行推理内存占用异常？ #8656

zhaogf01 · 2024-06-25T06:15:03Z

请提出你的问题

我在进行qwen-1_8模型推理时：
当开启2路张量并行时，在load权重时，内存占用是10GB左右
当开启4路张量并行时，在load权重时，内存占用是17GB左右
两者的差距正好是权重文件占用空间的2倍
因此，我想问，假设是2路张量并行，paddlenlp在load权重时是否是先将权重复制2份并存放在内存，之后在进行张量拆分？如果是的话，在进行16路张量并行时，就要复制16份？假使是千亿模型，那内存的占用量必然更多，这是合理的吗？

DesmonDay · 2024-06-25T06:46:45Z

不合理，这块我们有优化代码还没有提交。想问下你是咋运行的，可以给个脚本来。

zhaogf01 · 2024-06-25T07:03:38Z

不合理，这块我们有优化代码还没有提交。想问下你是咋运行的，可以给个脚本来。

脚本如下：
1、test.sh

 export CUDA_VISIBLE_DEVICES=3,4
 PYTHONPATH=../../:$PYTHONPATH  \
 python  -m paddle.distributed.launch \
      --devices "3,4" \
      test_qwen.py

2、test_qwen.py

from paddle.distributed import fleet
from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("qwen/qwen-1_8b")
strategy = fleet.DistributedStrategy()
 strategy.hybrid_configs = {
                   "dp_degree": 1,
                   "mp_degree": 2,
                   "pp_degree": 1,
                   "sharding_degree": 1,
              }
  fleet.init(is_collective=True, strategy=strategy)
  hcg = fleet.get_hybrid_communicate_group()
  tensor_parallel_rank = hcg.get_model_parallel_rank()
  model = AutoModelForCausalLM.from_pretrained("qwen/qwen-1_8b" , tensor_parallel_degree=2, tensor_parallel_rank=tensor_parallel_rank, dtype="float32")
  input_features = tokenizer("青岛推荐去哪玩", return_tensors="pd")
  outputs = model.generate(**input_features, max_length=128)
  print(tokenizer.batch_decode(outputs[0]))

DesmonDay · 2024-06-25T08:03:44Z

qwen使用的模型参数是safetensors格式还是pdparams格式？如果是safetensors格式，应该不会有这个问题。

zhaogf01 · 2024-06-25T08:19:22Z

qwen使用的模型参数是safetensors格式还是pdparams格式？如果是safetensors格式，应该不会有这个问题。

是pdparams格式的，这个权重是自动下载的。
那目前有没有平替的解决方案？或者有没有从pdparams到safetensors的转换脚本？

DesmonDay · 2024-06-25T08:57:22Z

可以在模型from_pretrained之后，调用save_pretrained方法来保存，设置safe_serialization=True。

zhaogf01 · 2024-07-09T06:24:25Z

我想进行tp推理，我在模型from_pretrained是否需要进行相应的tp配置，然后再调用save_pretrained方法来保存

zhaogf01 added the question Further information is requested label Jun 25, 2024

paddle-bot bot assigned DesmonDay Jun 25, 2024

DesmonDay closed this as completed Jul 1, 2024

zhaogf01 mentioned this issue Jul 9, 2024

tp推理，内存溢出问题 #8656 #8736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: 张量并行推理内存占用异常？ #8656

[Question]: 张量并行推理内存占用异常？ #8656

zhaogf01 commented Jun 25, 2024

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jun 25, 2024

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jun 25, 2024

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jul 9, 2024

[Question]: 张量并行推理内存占用异常？ #8656

[Question]: 张量并行推理内存占用异常？ #8656

Comments

zhaogf01 commented Jun 25, 2024

请提出你的问题

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jun 25, 2024

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jun 25, 2024

DesmonDay commented Jun 25, 2024

zhaogf01 commented Jul 9, 2024