Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sft_packing实现的问题 #2289

Open
1 task done
dyh1996 opened this issue Jan 22, 2024 · 7 comments · Fixed by #4009 · May be fixed by #4224
Open
1 task done

sft_packing实现的问题 #2289

dyh1996 opened this issue Jan 22, 2024 · 7 comments · Fixed by #4009 · May be fixed by #4224
Labels
pending This problem is yet to be addressed

Comments

@dyh1996
Copy link

dyh1996 commented Jan 22, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起,然后分别计算target部分的loss

def preprocess_packed_supervised_dataset(
examples: Dict[str, List[Any]],
tokenizer: "PreTrainedTokenizer",
template: "Template",
data_args: "DataArguments",
) -> Dict[str, List[List[int]]]:
# build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos>
# and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>
model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}

这里是不是应该增加对position_ids的修改呢?从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响

Expected behavior

No response

System Info

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed label Jan 22, 2024
@muzhi1991
Copy link

请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?

@DinhLuan14
Copy link

@hiyouga
Has LLama-Factory implemented this 'https://github.com/MeetKai/functionary/tree/main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

@Ricardokevins
Copy link

any update on this issue?
@hiyouga

@chiosChen
Copy link

llama 3也修改了attention mask,但没提position id,position id真的有必要修改吗?rope本身就是相对编码

@lugimzzz
Copy link

请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?

同样的问题,为什么不考虑处理atten_mask。单纯拼接,后面的数据能看到前面的数据的意义在哪?

@letterk
Copy link

letterk commented Jun 15, 2024

@hiyouga Has LLama-Factory implemented this 'MeetKai/functionary@main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances.

@hiyouga, do you have any plans to add this feature in the future?

@hiyouga
Copy link
Owner

hiyouga commented Jun 15, 2024

@letterk will be fixed after merging #4224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
8 participants