Implement efficient packing without cross-contamination attention #4224

chuan298 · 2024-06-11T21:10:55Z

What does this PR do?

Update 15/6/2024: Add support packing for eager and sdpa

Implement efficient packing without cross-contamination attention
Taking inspiration from some repository as axolotl and functionary, I applied packing sequences more effectively, enabling the model to learn samples more efficiently without attending to other samples within the same pack. Now I only support this implement for sft with flash_attention_2.

Example training config:

### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
flash_attn: fa2

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
efficient_packing: true

### output
output_dir: saves/llama3-8b/lora/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Before submitting

Did you read the contributor guideline?

AlongWY · 2024-06-20T09:00:26Z

是否应该考虑使用 varlen_flash_atten 实现?

hiyouga · 2024-06-20T09:20:27Z

src/llamafactory/train/sft/workflow.py

@@ -33,6 +33,9 @@ def run_sft(
    dataset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)

+    if data_args.efficient_packing:
+        configure_packing(model.config, model_args)


could we do configure_packing in llamafactory.model.patcher?

Sure, I just edited it

hiyouga · 2024-06-20T09:32:48Z

src/llamafactory/extras/constants.py

@@ -66,6 +66,21 @@

 SUPPORTED_CLASS_FOR_S2ATTN = {"llama"}

+SUPPORTED_CLASS_FOR_MULTIPACK = [


is it "efficient_packing" rather than "multipack"?

yes, I just fixed.

chuan298 · 2024-06-20T10:06:12Z

是否应该考虑使用 varlen_flash_atten 实现?

Hi @AlongWY , The models in transformers have used flash_attn_varlen_func by default when passing attention_mask. I just made a slight change to the attention_mask when packing sequences and returned indices, cu_seqlens, and max_seqlen_in_batch corresponding to the modified attention_mask.

implement efficient packing without cross-contamination attention

b2c367b

hiyouga added the pending This problem is yet to be addressed label Jun 12, 2024

hiyouga mentioned this pull request Jun 15, 2024

sft_packing实现的问题 #2289

Open

1 task

ancv added 2 commits June 15, 2024 23:00

remove some unused params

04315c3

update packing with sdpa and eager attention mode

238f5c3

hiyouga reviewed Jun 20, 2024

View reviewed changes

move configure_packing to llamafactory.model.patcher and fix constants

770f75d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement efficient packing without cross-contamination attention #4224

Implement efficient packing without cross-contamination attention #4224

chuan298 commented Jun 11, 2024 •

edited

Loading

AlongWY commented Jun 20, 2024

hiyouga Jun 20, 2024

chuan298 Jun 20, 2024

hiyouga Jun 20, 2024

chuan298 Jun 20, 2024

chuan298 commented Jun 20, 2024

		@@ -66,6 +66,21 @@

		SUPPORTED_CLASS_FOR_S2ATTN = {"llama"}

		SUPPORTED_CLASS_FOR_MULTIPACK = [

Implement efficient packing without cross-contamination attention #4224

Are you sure you want to change the base?

Implement efficient packing without cross-contamination attention #4224

Conversation

chuan298 commented Jun 11, 2024 • edited Loading

What does this PR do?

Update 15/6/2024: Add support packing for eager and sdpa

Before submitting

AlongWY commented Jun 20, 2024

hiyouga Jun 20, 2024

Choose a reason for hiding this comment

chuan298 Jun 20, 2024

Choose a reason for hiding this comment

hiyouga Jun 20, 2024

Choose a reason for hiding this comment

chuan298 Jun 20, 2024

Choose a reason for hiding this comment

chuan298 commented Jun 20, 2024

chuan298 commented Jun 11, 2024 •

edited

Loading