sft_packing实现的问题 #2289

dyh1996 · 2024-01-22T09:14:55Z

Reminder

I have read the README and searched the existing issues.

Reproduction

看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起，然后分别计算target部分的loss

def preprocess_packed_supervised_dataset(
examples: Dict[str, List[Any]],
tokenizer: "PreTrainedTokenizer",
template: "Template",
data_args: "DataArguments",
) -> Dict[str, List[List[int]]]:
# build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos>
# and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>
model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}

这里是不是应该增加对position_ids的修改呢？从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响

Expected behavior

No response

System Info

No response

Others

No response

The text was updated successfully, but these errors were encountered:

muzhi1991 · 2024-04-20T14:35:38Z

请问一下，对于packing的方式（尤其是sft的情况下），除了上面提到的pos，是不是应该设置合适的atten mask，来隔离不同的instance呢？

DinhLuan14 · 2024-04-22T07:43:16Z

@hiyouga
Has LLama-Factory implemented this 'https://github.com/MeetKai/functionary/tree/main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

Ricardokevins · 2024-04-30T09:08:24Z

any update on this issue?
@hiyouga

chiosChen · 2024-05-19T03:56:38Z

llama 3也修改了attention mask，但没提position id，position id真的有必要修改吗？rope本身就是相对编码

lugimzzz · 2024-06-13T09:12:49Z

请问一下，对于packing的方式（尤其是sft的情况下），除了上面提到的pos，是不是应该设置合适的atten mask，来隔离不同的instance呢？

同样的问题，为什么不考虑处理atten_mask。单纯拼接，后面的数据能看到前面的数据的意义在哪？

letterk · 2024-06-15T05:38:13Z

@hiyouga Has LLama-Factory implemented this 'MeetKai/functionary@main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances.

@hiyouga, do you have any plans to add this feature in the future?

hiyouga · 2024-06-15T06:37:17Z

@letterk will be fixed after merging #4224

hiyouga added the pending This problem is yet to be addressed label Jan 22, 2024

AlongWY mentioned this issue May 31, 2024

supervised packing with greedy knapsack algorithm #4009

Merged

1 task

hiyouga closed this as completed in #4009 Jun 6, 2024

hiyouga reopened this Jun 6, 2024

hiyouga linked a pull request Jun 15, 2024 that will close this issue

Implement efficient packing without cross-contamination attention #4224

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sft_packing实现的问题 #2289

sft_packing实现的问题 #2289

dyh1996 commented Jan 22, 2024

muzhi1991 commented Apr 20, 2024

DinhLuan14 commented Apr 22, 2024

Ricardokevins commented Apr 30, 2024

chiosChen commented May 19, 2024

lugimzzz commented Jun 13, 2024

letterk commented Jun 15, 2024

hiyouga commented Jun 15, 2024

sft_packing实现的问题 #2289

sft_packing实现的问题 #2289

Comments

dyh1996 commented Jan 22, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

muzhi1991 commented Apr 20, 2024

DinhLuan14 commented Apr 22, 2024

Ricardokevins commented Apr 30, 2024

chiosChen commented May 19, 2024

lugimzzz commented Jun 13, 2024

letterk commented Jun 15, 2024

hiyouga commented Jun 15, 2024