File tree Expand file tree Collapse file tree 1 file changed +6
-10
lines changed Expand file tree Collapse file tree 1 file changed +6
-10
lines changed Original file line number Diff line number Diff line change @@ -148,16 +148,12 @@ python show_result.py \
148
148
149
149
## Pairwise win-rate compared with GPT-3.5-davinci-003
150
150
| Model | Win | Loss | Tie | Win Rate | Loss Rate | Win Rate Adjusted |
151
- | :---------------------------------------------------------| ----:| -----:| ----:| ---------:| ----------:| ------------------:|
152
- | llm-jp--llm-jp-13b-instruct-lora-jaster-dolly-oasst-v1.0 | 65 | 146 | 26 | 0.274262 | 0.616034 | 0.329114 |
153
- | rinna--japanese-gpt-neox-3.6b-instruction-ppo | 8 | 62 | 10 | 0.100000 | 0.775000 | 0.162500 |
154
- | rinna--japanese-gpt-neox-3.6b-instruction-sft-v2 | 7 | 65 | 8 | 0.087500 | 0.812500 | 0.137500 |
155
- | cyberagent--calm2-7b-chat | 6 | 68 | 7 | 0.074074 | 0.839506 | 0.117284 |
156
- | llm-jp--llm-jp-13b-instruct-full-jaster-dolly-oasst-v1.0 | 5 | 66 | 8 | 0.063291 | 0.835443 | 0.113924 |
157
-
158
-
159
-
160
-
151
+ | ----------------------------------------------------------| -----| ------| -----| ----------| -----------| -------------------|
152
+ | llm-jp--llm-jp-13b-instruct-lora-jaster-dolly-oasst-v1.0 | 22 | 48 | 10 | 0.2750 | 0.6000 | 0.33750 |
153
+ | rinna--japanese-gpt-neox-3.6b-instruction-ppo | 10 | 61 | 9 | 0.1250 | 0.7625 | 0.18125 |
154
+ | llm-jp--llm-jp-13b-instruct-full-jaster-dolly-oasst-v1.0 | 7 | 65 | 8 | 0.0875 | 0.8125 | 0.13750 |
155
+ | rinna--japanese-gpt-neox-3.6b-instruction-sft-v2 | 8 | 69 | 3 | 0.1000 | 0.8625 | 0.11875 |
156
+ | cyberagent--calm2-7b-chat | 5 | 67 | 8 | 0.0625 | 0.8375 | 0.11250 |
161
157
162
158
The GPT4 judgments is placed in ` data/jp_bench/model_judgment/gpt-4_pair.jsonl ` .
163
159
You can’t perform that action at this time.
0 commit comments