Methodological error in zero cost, zero time, zero shot notebook #511

stephantul · 2024-04-20T13:11:17Z

Hi,

I was looking at the zero cost, zero time, zero shot notebook for financial sentiment analysis (i.e., this one), and discovered a methodological error that invalidates the conclusions of the distillation section.

What happens is that the train and test dataframes, i.e., the CSV files loaded from Moritz Laurer's blog, are created by splitting the train split of the dataset (the dataset doesn't have a test split). Later on, when distilling, the authors of blog post reload the entire train split of the dataset, and then use this to distill the MLP. This means that the test data is also used to distill the model, which leads to a big overestimation of performance.

In my experiments, the original score PRF score I got was:

(array([0.85507246, 0.97348485, 0.94166667]),
 array([0.96721311, 0.96981132, 0.88976378]),
 array([0.90769231, 0.97164461, 0.91497976]),
 array([ 61, 265, 127]))

Which is close to the reported score in the article.
If I instead remove the test data from the data used to distill the MLP, I get much lower scores:

(array([0.76785714, 0.87632509, 0.78947368]),
 array([0.70491803, 0.93584906, 0.70866142]),
 array([0.73504274, 0.90510949, 0.74688797]),
 array([ 61, 265, 127]))

These scores are much lower than the reported scores, and also much lower than the LLM scores, which invalidates the conclusion of the notebook and article. Note that these scores are still a bit higher than the scores you would get when just directly optimizing cross entropy, so you could argue that the point still makes sense.

If you want I can do a PR on the notebook.

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-04-20T18:39:46Z

@MosheWasserb

MosheWasserb · 2024-05-28T07:01:09Z

Hi @tomaarsen, Sorry miss your message :(
Great catch.
Yes, go ahead and issue a PR

stephantul · 2024-06-13T18:38:19Z

Hey @MosheWasserb ,

Thanks for replying, really appreciated.

Before I submit a PR, could we maybe discuss what you want the final conclusion of the article to look like? Because the part after you reload the dataset doesn't work any more. Should I just remove those parts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methodological error in zero cost, zero time, zero shot notebook #511

Methodological error in zero cost, zero time, zero shot notebook #511

stephantul commented Apr 20, 2024 •

edited

Loading

tomaarsen commented Apr 20, 2024

MosheWasserb commented May 28, 2024

stephantul commented Jun 13, 2024

Methodological error in zero cost, zero time, zero shot notebook #511

Methodological error in zero cost, zero time, zero shot notebook #511

Comments

stephantul commented Apr 20, 2024 • edited Loading

tomaarsen commented Apr 20, 2024

MosheWasserb commented May 28, 2024

stephantul commented Jun 13, 2024

stephantul commented Apr 20, 2024 •

edited

Loading