You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for a bit of a note insert, but there are lot of competitive fine tunes out there, for a more diverse set of benchmarks to test from, and I think OpenCompass might be useful in helping out evaluating overtrained models vs "accidental" general purpose models. LiveBench/LiveBench#95
Some questions I would like to ask:
which benchmarks are more likely to correlate to one another when it comes to ranked performance?
are finetuned models more likely to generalize between multiple tasks, or are there degredations?
are abliterated (or ablated) models necessary worse than the foundation model it uses?
are franken-merges like the ones in Sakana AI useful?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Sorry for a bit of a note insert, but there are lot of competitive fine tunes out there, for a more diverse set of benchmarks to test from, and I think OpenCompass might be useful in helping out evaluating overtrained models vs "accidental" general purpose models. LiveBench/LiveBench#95
Some questions I would like to ask:
Beta Was this translation helpful? Give feedback.
All reactions