Ability to specify fold indices for k-folds cross-validation #102
hihosilvers
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, oftentimes with time series classification or regression you might want to train your model on subsets of a longer timeseries (particularly regression). In this case, you will have subsequent examples that are time shifted by one time unit.
E.g. Example 1 might have timestamps:
[1, 2, 3, 4, 5]
and Example 2 might have timestamps
[2, 3, 4, 5, 6]
In these cases, it is important to ensure that Examples 1 and 2 are either both used for training or both used for cross-validation in each epoch, as it is easy to inflate the results of cross-validation if one is used for training and the other used for cross-validation.
To solve this problem, we might specify that both Examples 1 and 2 are part of the same fold for cross-validation. In practice this would mean splitting folds across some fundamental difference in samples (e.g. splitting folds across different patients in an ECG dataset). This is currently not possible with HyperTS's make_experiment function as you can only specify the number of folds, but not where the folds should be (in the common real-world scenario where the length of each fold needs to be different).
This can remedied by allowing the user to specify the fold for each training example by passing an array of length n (where n is the number of examples in the dataset) to the make_experiment function.
Beta Was this translation helpful? Give feedback.
All reactions