Potato/inner/calibration split #483

hfrick · 2024-05-21T13:59:50Z

First pass at adding those split methods with a few examples. I'll leave (self-)review comments for discussion.

TODO (to follow in separate PRs)

(initial) validation split
bootstrap
slider variants

Not planned

LOO: tune doesn't want to do this (and there is no intention to do so)
nested resamples: tune currently doesn't do this so we can wait with that one too
permutations: there is no assessment set so it doesn't work in tune either
manual resamples: If someone manually crafted their resampling, it most likely is nothing of what rsample already contains so it does not make sense to me to fall back on any of our already implemented option.

Additional notes on things to still add and where:

Whether or not the inner split is stratified on the outcome should be addressed in tune.
Similarly, what to do when there is no sensible inner split option should probably also be in tune, rsample will error as usual for now.

hfrick · 2024-05-21T17:13:00Z

R/inner_split.R

+#' @return An `rsplit` object.
+#' @keywords internal
+#' @export
+inner_split <- function(x, ...) {


Are we happy with that name? We can also go back and change it later (like container -> tailor).

I do kind of appreciate that inner_split() gives the vibe that these methods are for internal use. inner_split() feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.

R/inner_split.R

hfrick · 2024-05-21T17:24:32Z

R/inner_split.R

+
+  analysis_set <- analysis(x)
+
+  # TODO: reduce the number of clusters by 1 in tune?


Given that the basic idea of clustering_cv() is to use one cluster as the assessment set, I would reduce v by one for the inner split, so that the cluster left out for the inner split is more likely to be similar to one of the original clusters. If we use the same v, the inner clustering is likely to break up the v-1 clusters in this (outer) analysis set.

I would put that into tune though, not here.

hfrick · 2024-05-21T17:26:33Z

tests/testthat/test-inner_split.R

+test_that("mc_split", {
+  set.seed(11)
+  r_set <- mc_cv(warpbreaks)
+  split_args <- get_split_args(r_set)


I would expect this or something similar to be the pattern in tune. If so, we can export this helper function, maybe as .get_split_args().

Yes, this would be awesome to have access to :)

Naming-wise, noting the existence of get_rsplit() might make get_rsplit_args() (or prefixed with dot) a friend

I just discovered tune::pull_rset_attributes(). Possibly a similar idea?

The tune function pulls a bigger set of attributes with the ones we're interested in here nested in the output of the tune function. I'm open to trying to make that one function but am also fine with them co-existing.

Re name: That pair of names is a nice idea. Would it be confusing if it didn't always return the exact arguments of the rsplit? That can happen for e.g. vfold_cv where it also return the repeats argument which is only relevant for the rset but not the rsplit inside. The object for an initial_validation_split is not an rsplit in the first place since it's a threeway_split. Is that too pedantic? 😄

Yup, looks like tune::pull_rset_attributes does provide a good bit more info. Since we were able to make do in tidymodels/tune#894 with that just function, no pressure from me to export or otherwise mobilize further here. :)

EmilHvitfeldt · 2024-05-21T18:02:48Z

R/inner_split.R

+  )
+  split_inner <- split_inner$splits[[1]]
+
+  class_inner <- paste0(class(x)[1], "_inner")


not to be pedantic, but wouldn't class_inner by definition be "mc_split_inner"? same for other class_inner. While i appreciate the same code being used across, I think we could just note the class directly

Do you mean that in reference to this particular method or for all the methods?

Since I had just come across #478, I opted for constructing the class rather than writing it out manually here to make sure it would always stay in sync with the class of the input x. In terms of readability, I would say that the class of that one is fairly easy to see from the S3 dispatch.

what i meant, is that each paste0(class(x)[1], "_inner") in this file could be swapped with a mc_split_inner, apparent_split_inner, etc etc as they are called inside s3 methods, on the object that drives the s3 dispatch

R/inner_split.R

simonpcouch

Need to go AFK for a couple hours so going to go ahead and send off comments from my first pass through! I think this is in a really good spot already.

simonpcouch · 2024-05-22T13:48:03Z

R/inner_split.R

+#' @return An `rsplit` object.
+#' @keywords internal
+#' @export
+inner_split <- function(x, ...) {


I do kind of appreciate that inner_split() gives the vibe that these methods are for internal use. inner_split() feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.

R/inner_split.R

simonpcouch · 2024-05-22T14:17:57Z

tests/testthat/test-inner_split.R

+test_that("mc_split", {
+  set.seed(11)
+  r_set <- mc_cv(warpbreaks)
+  split_args <- get_split_args(r_set)


Yes, this would be awesome to have access to :)

simonpcouch · 2024-05-22T14:20:27Z

tests/testthat/test-inner_split.R

+test_that("mc_split", {
+  set.seed(11)
+  r_set <- mc_cv(warpbreaks)
+  split_args <- get_split_args(r_set)


Naming-wise, noting the existence of get_rsplit() might make get_rsplit_args() (or prefixed with dot) a friend

to force errors for usused arguments if necessary, i.e. not swallow them silently.

github-actions · 2024-06-07T01:26:08Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

hfrick added 5 commits May 21, 2024 14:43

extract helper function

7586614

inner_split() generic + MC methods

7bbc732

add inner_split() method for vfold cv

c966979

add inner_split() method for clustering cv

d306c59

add inner_split() method for apparent split

d668b75

hfrick commented May 21, 2024

View reviewed changes

hfrick requested review from topepo, EmilHvitfeldt and simonpcouch May 21, 2024 17:34

EmilHvitfeldt approved these changes May 21, 2024

View reviewed changes

simonpcouch reviewed May 22, 2024

View reviewed changes

hfrick mentioned this pull request May 22, 2024

inner_split(): keep everything inside of split_args or not? #487

Open

hfrick added 6 commits May 22, 2024 21:06

a little metaprogramming

893b071

to force errors for usused arguments if necessary, i.e. not swallow them silently.

export renamed helper for use in tune

f42e8fd

Merged origin/main into inner_split

c582aaf

update to new class for grouped MC

171c83f

update NEWS

3a6030a

add more documentation

f915a34

hfrick marked this pull request as ready for review May 23, 2024 10:29

hfrick merged commit 776d46f into main May 23, 2024
12 checks passed

hfrick deleted the inner_split branch May 23, 2024 10:59

This was referenced May 23, 2024

Add inner_split() methods for bootstrap #488

Merged

Don't export .get_split_args() #495

Closed

Manual classes for inner_split() splits #497

Merged

github-actions bot locked and limited conversation to collaborators Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potato/inner/calibration split #483

Potato/inner/calibration split #483

hfrick commented May 21, 2024 •

edited

Loading

hfrick May 21, 2024

simonpcouch May 22, 2024

hfrick May 21, 2024

hfrick May 21, 2024

simonpcouch May 22, 2024

simonpcouch May 22, 2024

simonpcouch May 22, 2024

hfrick May 23, 2024

simonpcouch May 23, 2024

EmilHvitfeldt May 21, 2024

hfrick May 22, 2024

EmilHvitfeldt May 23, 2024

simonpcouch left a comment

simonpcouch May 22, 2024

simonpcouch May 22, 2024

simonpcouch May 22, 2024

github-actions bot commented Jun 7, 2024


		analysis_set <- analysis(x)

		# TODO: reduce the number of clusters by 1 in tune?

Potato/inner/calibration split #483

Potato/inner/calibration split #483

Conversation

hfrick commented May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 7, 2024

hfrick commented May 21, 2024 •

edited

Loading