-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
Description
Currently, the tutorial call neuron_parallel_compile
inside of the bash script. Because neuron_parallel_compile
is responsible for setting $NEURON_EXTRACT_GRAPHS_ONLY
, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi
optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.mdx
Lines 215 to 262 in 3748a06
```bash | |
#!/bin/bash | |
set -ex | |
export NEURON_FUSE_SOFTMAX=1 | |
export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 | |
export MALLOC_ARENA_MAX=64 | |
export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/" | |
PROCESSES_PER_NODE=8 | |
NUM_EPOCHS=1 | |
TP_DEGREE=2 | |
PP_DEGREE=1 | |
BS=1 | |
GRADIENT_ACCUMULATION_STEPS=8 | |
LOGGING_STEPS=1 | |
MODEL_NAME="meta-llama/Meta-Llama-3-8B" | |
OUTPUT_DIR=output-$SLURM_JOB_ID | |
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then | |
MAX_STEPS=$((LOGGING_STEPS + 5)) | |
else | |
MAX_STEPS=-1 | |
fi | |
XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \ | |
--model_id $MODEL_NAME \ | |
--num_train_epochs $NUM_EPOCHS \ | |
--do_train \ | |
--learning_rate 5e-5 \ | |
--warmup_ratio 0.03 \ | |
--max_steps $MAX_STEPS \ | |
--per_device_train_batch_size $BS \ | |
--per_device_eval_batch_size $BS \ | |
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ | |
--gradient_checkpointing true \ | |
--bf16 \ | |
--zero_1 false \ | |
--tensor_parallel_size $TP_DEGREE \ | |
--pipeline_parallel_size $PP_DEGREE \ | |
--logging_steps $LOGGING_STEPS \ | |
--save_total_limit 1 \ | |
--output_dir $OUTPUT_DIR \ | |
--lr_scheduler_type "constant" \ | |
--overwrite_output_dir | |
``` |
We need to refactor the tutorial to call neuron_parallel_compile
on the training script.
Example can be found here: