Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding gap between cross-validated F1 and holdout F1 #26

Open
emilycantrell opened this issue Jun 12, 2024 · 0 comments
Open

Understanding gap between cross-validated F1 and holdout F1 #26

emilycantrell opened this issue Jun 12, 2024 · 0 comments

Comments

@emilycantrell
Copy link
Collaborator

Holdout F1 tends to be substantially lower than cross-validated F1, and we want to understand why so that we can make a better model for the holdout set.

Things we might try:

  • Split off an internal test set from the PreFer training data, and examine how F1 changes between CV score and test set score. Alternatively, look at consistency in F1 across CV folds.
  • Study the consistency of feature selection across folds. Should we restrict the number of features in order to have more consistently in F1 score from CV to holdout?
  • Should we restrict the model flexibility in order to have more consistently in F1 score from CV to holdout?

Once we've done some internal experimentation, we will take up Lisa and Gert's offer to let us test a simple model on the holdout set.

We are going to focus on the monkeys paper for now. We will return to these ideas later and decide whether/how much to pursue them depending how much time we have available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant