Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falcon-180B "forward() got an unexpected keyword argument 'position_ids'" #38

Open
aittalam opened this issue Sep 22, 2023 · 0 comments

Comments

@aittalam
Copy link

aittalam commented Sep 22, 2023

Hi Phil, and many thanks for the new Falcon-180B tutorial!

I am trying to replicate your experiment but when I run it I get the error forward() got an unexpected keyword argument 'position_ids'. I saw that, while llama_patch takes position_ids as a parameter, falcon_patch redefines forward() without it. I saw the patch is used only when one tries to runt the code with flash attention so I disabled it, but I am not sure it was supposed to run without it (first it is in the post's title 😅, plus I get a Signal 7 (SIGBUS) but I am not sure it is necessarily related to it...). What are your thoughts about this?

Ah I forgot to say: I am running it from command line as shown in the jupyter notebook:

torchrun --nproc_per_node 8 run_ds_lora.py \
  --model_id tiiuae/falcon-180B \
  --dataset_path dolly-processed \
  --output_dir falcon-180b-lora-fa \
  --num_train_epochs 3 \
  --per_device_train_batch_size 1 \
  --learning_rate 4e-3 \
  --gradient_checkpointing True \
  --gradient_accumulation_steps 8 \
  --bf16 True \
  --tf32 True \
  --use_flash_attn True \
  --lr_scheduler_type "constant_with_warmup" \
  --logging_steps 25 \
  --save_steps 100 \
  --save_total_limit 3 \
  --deepspeed configs/ds_falcon_180b_z3.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant