-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potato/inner/calibration split #483
Conversation
#' @return An `rsplit` object. | ||
#' @keywords internal | ||
#' @export | ||
inner_split <- function(x, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we happy with that name? We can also go back and change it later (like container -> tailor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do kind of appreciate that inner_split()
gives the vibe that these methods are for internal use. inner_split()
feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.
|
||
analysis_set <- analysis(x) | ||
|
||
# TODO: reduce the number of clusters by 1 in tune? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the basic idea of clustering_cv()
is to use one cluster as the assessment set, I would reduce v
by one for the inner split, so that the cluster left out for the inner split is more likely to be similar to one of the original clusters. If we use the same v
, the inner clustering is likely to break up the v-1
clusters in this (outer) analysis set.
I would put that into tune though, not here.
tests/testthat/test-inner_split.R
Outdated
test_that("mc_split", { | ||
set.seed(11) | ||
r_set <- mc_cv(warpbreaks) | ||
split_args <- get_split_args(r_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this or something similar to be the pattern in tune. If so, we can export this helper function, maybe as .get_split_args()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be awesome to have access to :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise, noting the existence of get_rsplit()
might make get_rsplit_args()
(or prefixed with dot) a friend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just discovered tune::pull_rset_attributes()
. Possibly a similar idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tune function pulls a bigger set of attributes with the ones we're interested in here nested in the output of the tune function. I'm open to trying to make that one function but am also fine with them co-existing.
Re name: That pair of names is a nice idea. Would it be confusing if it didn't always return the exact arguments of the rsplit
? That can happen for e.g. vfold_cv where it also return the repeats
argument which is only relevant for the rset
but not the rsplit
inside. The object for an initial_validation_split
is not an rsplit
in the first place since it's a threeway_split
. Is that too pedantic? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks like tune::pull_rset_attributes
does provide a good bit more info. Since we were able to make do in tidymodels/tune#894 with that just function, no pressure from me to export or otherwise mobilize further here. :)
) | ||
split_inner <- split_inner$splits[[1]] | ||
|
||
class_inner <- paste0(class(x)[1], "_inner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not to be pedantic, but wouldn't class_inner
by definition be "mc_split_inner"
? same for other class_inner
. While i appreciate the same code being used across, I think we could just note the class directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that in reference to this particular method or for all the methods?
Since I had just come across #478, I opted for constructing the class rather than writing it out manually here to make sure it would always stay in sync with the class of the input x
. In terms of readability, I would say that the class of that one is fairly easy to see from the S3 dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i meant, is that each paste0(class(x)[1], "_inner")
in this file could be swapped with a mc_split_inner
, apparent_split_inner
, etc etc as they are called inside s3 methods, on the object that drives the s3 dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to go AFK for a couple hours so going to go ahead and send off comments from my first pass through! I think this is in a really good spot already.
#' @return An `rsplit` object. | ||
#' @keywords internal | ||
#' @export | ||
inner_split <- function(x, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do kind of appreciate that inner_split()
gives the vibe that these methods are for internal use. inner_split()
feels good to me! I'm open to other options but would give preference to names that hint these are for expert use only.
tests/testthat/test-inner_split.R
Outdated
test_that("mc_split", { | ||
set.seed(11) | ||
r_set <- mc_cv(warpbreaks) | ||
split_args <- get_split_args(r_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be awesome to have access to :)
tests/testthat/test-inner_split.R
Outdated
test_that("mc_split", { | ||
set.seed(11) | ||
r_set <- mc_cv(warpbreaks) | ||
split_args <- get_split_args(r_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise, noting the existence of get_rsplit()
might make get_rsplit_args()
(or prefixed with dot) a friend
to force errors for usused arguments if necessary, i.e. not swallow them silently.
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
First pass at adding those split methods with a few examples. I'll leave (self-)review comments for discussion.
TODO (to follow in separate PRs)
Not planned
Additional notes on things to still add and where: