-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/support qwenvl glm4-v (tested) #4377
base: main
Are you sure you want to change the base?
Conversation
终于还差一个image的padding处理就能做好训练支持了。 |
@hiyouga 改的比较多捏,有空帮忙看看这个实现思路行不行。谢谢。 |
Image.fromarray(image).convert("RGB").save(image_path) | ||
messages[-1]["content"] = template.format_image.apply(content=os.fspath(image_path))[0] + messages[-1]["content"] | ||
elif image is not None and model_args.visual_inputs_type == "vision_message_embed": | ||
messages[-1]["content"] = template.format_image.apply()[0] + messages[-1]["content"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果不是内嵌在文本的image url默认放在最后一个的开头(Qwenvl如果不是开头效果不好)
if model_args.visual_inputs_type == "vision_message_embed": | ||
dataset = dataset.rename_column("image_inputs","images") | ||
print(dataset["images"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset.map不能重用删除的column_name
transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)), | ||
] | ||
) | ||
return transform(images[0]) if len(images) != 0 else transform(Image.new("RGB", (1120, 1120), (255, 255, 255))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
数据集加载
if model_args.visual_inputs and finetuning_args.freeze_vision_tower: | ||
target_modules = "^(?!.*vision_tower).*(?:{}).*".format("|".join(target_modules)) | ||
if model_args.visual_inputs and finetuning_args.freeze_vision: | ||
target_modules = f"^(?!.*{VISION_FREEZE_MAP[model_args.visual_inputs_type]})."+"*(?:{}).*".format("|".join(target_modules)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实还有点小问题,可能会把GLM4的视觉模块附加了
成功跑了训练。 |
What does this PR do?
Fixes #4375
Before submitting