Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potato/inner/calibration split #483

Merged
merged 11 commits into from
May 23, 2024
Merged

Potato/inner/calibration split #483

merged 11 commits into from
May 23, 2024

Conversation

hfrick
Copy link
Member

@hfrick hfrick commented May 21, 2024

First pass at adding those split methods with a few examples. I'll leave (self-)review comments for discussion.

TODO (to follow in separate PRs)

  • (initial) validation split
  • bootstrap
  • slider variants

Not planned

  • LOO: tune doesn't want to do this (and there is no intention to do so)
  • nested resamples: tune currently doesn't do this so we can wait with that one too
  • permutations: there is no assessment set so it doesn't work in tune either
  • manual resamples: If someone manually crafted their resampling, it most likely is nothing of what rsample already contains so it does not make sense to me to fall back on any of our already implemented option.

Additional notes on things to still add and where:

  • Whether or not the inner split is stratified on the outcome should be addressed in tune.
  • Similarly, what to do when there is no sensible inner split option should probably also be in tune, rsample will error as usual for now.

#' @return An `rsplit` object.
#' @keywords internal
#' @export
inner_split <- function(x, ...) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with that name? We can also go back and change it later (like container -> tailor).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do kind of appreciate that inner_split() gives the vibe that these methods are for internal use. inner_split() feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.

R/inner_split.R Show resolved Hide resolved
R/inner_split.R Show resolved Hide resolved

analysis_set <- analysis(x)

# TODO: reduce the number of clusters by 1 in tune?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the basic idea of clustering_cv() is to use one cluster as the assessment set, I would reduce v by one for the inner split, so that the cluster left out for the inner split is more likely to be similar to one of the original clusters. If we use the same v, the inner clustering is likely to break up the v-1 clusters in this (outer) analysis set.

I would put that into tune though, not here.

test_that("mc_split", {
set.seed(11)
r_set <- mc_cv(warpbreaks)
split_args <- get_split_args(r_set)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this or something similar to be the pattern in tune. If so, we can export this helper function, maybe as .get_split_args().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be awesome to have access to :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming-wise, noting the existence of get_rsplit() might make get_rsplit_args() (or prefixed with dot) a friend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discovered tune::pull_rset_attributes(). Possibly a similar idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tune function pulls a bigger set of attributes with the ones we're interested in here nested in the output of the tune function. I'm open to trying to make that one function but am also fine with them co-existing.

Re name: That pair of names is a nice idea. Would it be confusing if it didn't always return the exact arguments of the rsplit? That can happen for e.g. vfold_cv where it also return the repeats argument which is only relevant for the rset but not the rsplit inside. The object for an initial_validation_split is not an rsplit in the first place since it's a threeway_split. Is that too pedantic? 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looks like tune::pull_rset_attributes does provide a good bit more info. Since we were able to make do in tidymodels/tune#894 with that just function, no pressure from me to export or otherwise mobilize further here. :)

)
split_inner <- split_inner$splits[[1]]

class_inner <- paste0(class(x)[1], "_inner")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to be pedantic, but wouldn't class_inner by definition be "mc_split_inner"? same for other class_inner. While i appreciate the same code being used across, I think we could just note the class directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that in reference to this particular method or for all the methods?

Since I had just come across #478, I opted for constructing the class rather than writing it out manually here to make sure it would always stay in sync with the class of the input x. In terms of readability, I would say that the class of that one is fairly easy to see from the S3 dispatch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i meant, is that each paste0(class(x)[1], "_inner") in this file could be swapped with a mc_split_inner, apparent_split_inner, etc etc as they are called inside s3 methods, on the object that drives the s3 dispatch

R/inner_split.R Outdated Show resolved Hide resolved
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to go AFK for a couple hours so going to go ahead and send off comments from my first pass through! I think this is in a really good spot already.

#' @return An `rsplit` object.
#' @keywords internal
#' @export
inner_split <- function(x, ...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do kind of appreciate that inner_split() gives the vibe that these methods are for internal use. inner_split() feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.

R/inner_split.R Outdated Show resolved Hide resolved
R/inner_split.R Show resolved Hide resolved
R/inner_split.R Show resolved Hide resolved
test_that("mc_split", {
set.seed(11)
r_set <- mc_cv(warpbreaks)
split_args <- get_split_args(r_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be awesome to have access to :)

test_that("mc_split", {
set.seed(11)
r_set <- mc_cv(warpbreaks)
split_args <- get_split_args(r_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming-wise, noting the existence of get_rsplit() might make get_rsplit_args() (or prefixed with dot) a friend

@hfrick hfrick marked this pull request as ready for review May 23, 2024 10:29
@hfrick hfrick merged commit 776d46f into main May 23, 2024
12 checks passed
@hfrick hfrick deleted the inner_split branch May 23, 2024 10:59
Copy link

github-actions bot commented Jun 7, 2024

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants