Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: llm merge_lora_params 合并后不保存 merge权重 #8575

Closed
1 task done
sanbuphy opened this issue Jun 10, 2024 · 6 comments
Closed
1 task done

[Bug]: llm merge_lora_params 合并后不保存 merge权重 #8575

sanbuphy opened this issue Jun 10, 2024 · 6 comments
Assignees
Labels
bug Something isn't working stale

Comments

@sanbuphy
Copy link

sanbuphy commented Jun 10, 2024

软件环境

- paddlepaddle: 
- paddlepaddle-gpu:  develop 
- paddlenlp: lastest 162d8d31c84f60b804a0abeee8f4f1e4b32308ef

重复问题

  • I have searched the existing issues

错误描述

使用 llm merge_lora_params.py,合并一个 QLora 训练好的模型,但是没有合并后的模型结果,输出文件夹什么都没出现

稳定复现步骤 & 代码

python merge_lora_params.py
--model_name_or_path FlagAlpha/Llama2-Chinese-7b-Chat
--lora_path /home/aistudio/data/checkpoints/llama_lora_ckpts/checkpoint-286
--merge_lora_model_path /home/aistudio/data/llama_lora_merge
--device "gpu"
--low_gpu_mem True

似乎一直卡在加载的阶段,然后过一阵子后直接结束进程。(怀疑内存不够,但应该不至于吧 ,aistudio 32g v100 开发机)

image

image

但并非是 lora 问题,因为可以动态图加载推理

python predictor.py --model_name_or_path FlagAlpha/Llama2-Chinese-7b-Chat \
                    --data_file /home/aistudio/data/dummy/dev.json --dtype float16 \
                    --lora_path /home/aistudio/data/checkpoints/llama_lora_ckpts/checkpoint-286
@sanbuphy sanbuphy added the bug Something isn't working label Jun 10, 2024
@DesmonDay
Copy link
Contributor

DesmonDay commented Jun 12, 2024

截屏2024-06-12 13 48 44

能否把代码单独拎出来,如果你只是加载llama参数可以正常加载么?from_pretrained。

@sanbuphy
Copy link
Author

截屏2024-06-12 13 48 44 能否把代码单独拎出来,如果你只是加载llama参数可以正常加载么?from_pretrained。

我试试看,不过我怀疑是直接被kill了 ,不过不至于 32G都不够用? 很神奇

@DesmonDay
Copy link
Contributor

嗯嗯,Killed不排除你环境问题,可能是有别人也在使用机器。

@sanbuphy
Copy link
Author

嗯嗯,Killed不排除你环境问题,可能是有别人也在使用机器。

我知道问题在哪了。带上 model_name_or_path 字段就不能正常保存,只会存一个 json;

去掉后就可以正常 merge, Qlora应该不会对这个有影响吧;感觉是 字段导致的问题

Copy link

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Aug 15, 2024
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants