More models and more tests to include? #1753

BradKML · 2024-12-11T06:56:12Z

BradKML
Dec 11, 2024

Sorry for a bit of a note insert, but there are lot of competitive fine tunes out there, for a more diverse set of benchmarks to test from, and I think OpenCompass might be useful in helping out evaluating overtrained models vs "accidental" general purpose models. LiveBench/LiveBench#95

Some questions I would like to ask:

which benchmarks are more likely to correlate to one another when it comes to ranked performance?
are finetuned models more likely to generalize between multiple tasks, or are there degredations?
are abliterated (or ablated) models necessary worse than the foundation model it uses?
are franken-merges like the ones in Sakana AI useful?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More models and more tests to include? #1753

{{title}}

Replies: 0 comments

Select a reply

More models and more tests to include? #1753

BradKML Dec 11, 2024

Replies: 0 comments

BradKML
Dec 11, 2024