Skip to content

Commit

Permalink
Add run_benchmark_all.sh for benchmarking
Browse files Browse the repository at this point in the history
- add scripts/run_all_benchmark.sh to achieve benchmark with model hf
name/local path
- modify scripts/run_benchmark.sh to adapt scripts/run_all_benchmark.sh
- add new section on README.md
  • Loading branch information
2003pro committed May 14, 2023
1 parent 905a778 commit f05f444
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 6 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,19 @@ Those scripts invoke the examples `examples/*.py` built based on our APIs. For
more API-related examples, one may refer to the methods in the unittest
`tests`.

### 3.3 Run Benchmark Evluation

One can directly run group evaluation to implement the evaluation results for
[LLM comparision](https://docs.google.com/spreadsheets/d/1JYh4_pxNzmNA9I0YM2epgRA7VXBIeIGS64gPJBg5NHA/edit?usp=sharing),
e.g., to run GPT2 XL, one may execute (`--model_name_or_path` is required here,
you may fill in huggingface model name or local model path here)
```sh
./scripts/run_benchmark.sh --model_name_or_path gpt2-xl
```

To check the evaluation results, you may check `benchmark.log` in `./output_dir/gpt2-xl_lmflow_chat_nll_eval`,
`./output_dir/gpt2-xl_all_nll_eval` and `./output_dir/gpt2-xl_commonsense_qa_eval`.

## 4. Additional Notes
### 4.1 LLaMA Checkpoint

Expand Down
49 changes: 49 additions & 0 deletions scripts/run_all_benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

help_message="./$(basename $0)"
help_message+=" --model_name_or_path MODEL_NAME_OR_PATH"

if [ $# -ge 1 ]; then
extra_args="$@"
fi

model_name_or_path=""
while [[ $# -ge 1 ]]; do
key="$1"
case ${key} in
-h|--help)
printf "${help_message}" 1>&2
return 0
;;
--model_name_or_path)
model_name_or_path="$2"
shift
;;
*)
# Ignores unknown options
esac
shift
done

model_name=$(echo "${model_name_or_path}" | sed "s/\//--/g")
echo ${model_name}

if [[ "${model_name}" = "" ]]; then
echo "no model name specified" 1>&2
exit 1
fi

log_dir=output_dir/${model_name}_lmflow_chat_nll_eval
mkdir -p ${log_dir}
echo "[Evaluating] Evaluate on LMFlow_chat"
./scripts/run_benchmark.sh ${extra_args} --dataset_name lmflow_chat_nll_eval | tee ${log_dir}/benchmark.log 2> ${log_dir}/benchmark.err

log_dir=output_dir/${model_name}_all_nll_eval
mkdir -p ${log_dir}
echo "[Evaluating] Evaluate on [commonsense, wiki, instruction_following (gpt4) ] nll evaluation"
./scripts/run_benchmark.sh ${extra_args} --dataset_name all_nll_eval | tee ${log_dir}/benchmark.log 2> ${log_dir}/benchmark.err

log_dir=output_dir/${model_name}_commonsense_qa_eval
mkdir -p ${log_dir}
echo "[Evaluating] Evaluate on commonsense QA Accuracy evaluation"
./scripts/run_benchmark.sh ${extra_args} --dataset_name commonsense_qa_eval | tee ${log_dir}/benchmark.log 2> ${log_dir}/benchmark.err
7 changes: 1 addition & 6 deletions scripts/run_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,16 @@ if [ "$1" == "-h" -o "$1" == "--help" ]; then
exit 1
fi

exp_id=benchmarking
extra_args="--dataset_name gpt4_en_eval --model_name_or_path gpt2"
if [ $# -ge 1 ]; then
extra_args="$@"
fi
log_dir=output_dir/${exp_id}_nll

mkdir -p ${log_dir}

CUDA_VISIBLE_DEVICES=0 \
deepspeed --master_port 11001 examples/benchmarking.py \
--use_ram_optimized_load 0 \
--deepspeed examples/ds_config.json \
--metric nll \
--prompt_structure "###Human: {input}###Assistant:" \
${extra_args} \
| tee ${log_dir}/benchmark.log \
2> ${log_dir}/benchmark.err
${extra_args}

0 comments on commit f05f444

Please sign in to comment.