Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37

ibicdev · 2023-09-21T23:07:06Z

Thanks Phil for the great post "Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention". When I tried to change falon to llama2 (tried all 3 mode sizes), I always got "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)". Should there be more changes than just model name to make it work? Or will you have a follow up post about fine tuning Llama2 with DeepSpeed + LoRA?

philschmid · 2023-09-22T06:41:37Z

Seems to be an hardware and environment issue unrelated to the code. I used cuda 11.8

ibicdev · 2023-09-22T22:24:02Z

I am also using cuda 11.8, and pytorch 2.01 for cuda 11.8. Also tried pytorch nightly and got the same error. --use_flash_attn False didn't make a difference either. The error is RuntimeError: CUDA error: device-side assert triggered, followed by about a hundred lines of

../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [313,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.

This error looked similar to lm-sys/FastChat#199; tried their suggests and none worked. One explanation on that thread is the vocab causing embedding lookup out-of-bounds issue though the vocab seems already fixed in llama-2.

philschmid · 2023-09-23T07:12:09Z

Does the example without code changes work?

ibicdev · 2023-09-23T14:05:39Z

Yes, it worked well without any code change.

philschmid · 2023-09-25T07:16:46Z

What change did you make?

ibicdev · 2023-09-25T13:19:08Z

The only change I made is --model_id, from tiiuae/falcon-180B to meta-llama/Llama-2-70b-hf. The full command is

torchrun --nproc_per_node 8 run_ds_lora.py \
  --model_id meta-llama/Llama-2-70b-hf \
  --dataset_path dolly-processed \
  --output_dir falcon-180b-lora-fa \
  --num_train_epochs 3 \
  --per_device_train_batch_size 1 \
  --learning_rate 4e-3 \
  --gradient_checkpointing True \
  --gradient_accumulation_steps 8 \
  --bf16 True \
  --tf32 True \
  --use_flash_attn True \
  --lr_scheduler_type "constant_with_warmup" \
  --logging_steps 25 \
  --save_steps 100 \
  --save_total_limit 3 \
  --deepspeed configs/ds_falcon_180b_z3.json

philschmid · 2023-09-25T13:47:03Z

did you make changes to the flash attention patch? The example only works with falcon since it has a custom patch to use flash attention.

ibicdev · 2023-09-25T14:29:29Z

Ahh, I didn't. I saw your code https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/peft_utils.py#L38-L41, and thought it's already taken care of.

Also, even when I used --use_flash_attn False I still got the same error.

ibicdev · 2023-09-26T03:26:31Z

Excited to see flash-attn 2 natively supported in transformers! Would you plan to update this post to work with this new feature?

philschmid · 2023-09-26T06:24:35Z

Yes! 👍🏻 Plan to update all my posts and remove that patches once there is an official release.

ibicdev · 2023-09-26T12:22:29Z

Great! Looking forward to the updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37

ibicdev commented Sep 21, 2023

philschmid commented Sep 22, 2023

ibicdev commented Sep 22, 2023

philschmid commented Sep 23, 2023

ibicdev commented Sep 23, 2023

philschmid commented Sep 25, 2023

ibicdev commented Sep 25, 2023

philschmid commented Sep 25, 2023

ibicdev commented Sep 25, 2023

ibicdev commented Sep 26, 2023

philschmid commented Sep 26, 2023

ibicdev commented Sep 26, 2023

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37

Comments

ibicdev commented Sep 21, 2023

philschmid commented Sep 22, 2023

ibicdev commented Sep 22, 2023

philschmid commented Sep 23, 2023

ibicdev commented Sep 23, 2023

philschmid commented Sep 25, 2023

ibicdev commented Sep 25, 2023

philschmid commented Sep 25, 2023

ibicdev commented Sep 25, 2023

ibicdev commented Sep 26, 2023

philschmid commented Sep 26, 2023

ibicdev commented Sep 26, 2023