Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand grouping variables for bootstrap intervals #465

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

topepo
Copy link
Member

@topepo topepo commented Jan 19, 2024

For some tune internals, it would be helpful to be able to make intervals for an extended set of column columns (s opposed to just terms). See tidymodels/tune#818.

These changes are a proposal to expand things to include columns starting with a period. We can discuss it, and I can create more unit tests if we're good with this.

Here's an example:

library(tidymodels)
tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
# Get regression estimates for each house type
lm_est <- function(split, ...) {
  analysis(split) %>%
    tidyr::nest(.by = c(type)) %>%
    mutate(
      betas = purrr::map(data, ~ lm(log10(price) ~ sqft, data = .x) %>% tidy())
      ) %>%
    rename(.type = type) %>%
    select(.type, betas) %>%
    unnest(cols = betas)
}

set.seed(52156)
house_rs <-
  bootstraps(Sacramento, 1000, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(house_rs, results)
#> # A tibble: 6 × 7
#>   term        .type           .lower .estimate   .upper .alpha .method   
#>   <chr>       <fct>            <dbl>     <dbl>    <dbl>  <dbl> <chr>     
#> 1 (Intercept) Condo         4.45     4.59      4.72       0.05 percentile
#> 2 (Intercept) Multi_Family  4.74     5.25      5.71       0.05 percentile
#> 3 (Intercept) Residential   4.93     4.96      4.99       0.05 percentile
#> 4 sqft        Condo         0.000412 0.000520  0.000659   0.05 percentile
#> 5 sqft        Multi_Family -0.000197 0.0000344 0.000277   0.05 percentile
#> 6 sqft        Residential   0.000211 0.000225  0.000240   0.05 percentile
int_t(house_rs, results)
#> # A tibble: 6 × 7
#>   term        .type           .lower .estimate   .upper .alpha .method  
#>   <chr>       <fct>            <dbl>     <dbl>    <dbl>  <dbl> <chr>    
#> 1 (Intercept) Condo         4.47     4.59      4.73       0.05 student-t
#> 2 (Intercept) Multi_Family  4.81     5.25      5.78       0.05 student-t
#> 3 (Intercept) Residential   4.93     4.96      4.99       0.05 student-t
#> 4 sqft        Condo         0.000386 0.000520  0.000621   0.05 student-t
#> 5 sqft        Multi_Family -0.000193 0.0000344 0.000223   0.05 student-t
#> 6 sqft        Residential   0.000210 0.000225  0.000239   0.05 student-t
int_bca(house_rs, results, .fn = lm_est)
#> # A tibble: 6 × 7
#>   term        .type           .lower .estimate   .upper .alpha .method
#>   <chr>       <fct>            <dbl>     <dbl>    <dbl>  <dbl> <chr>  
#> 1 (Intercept) Residential   4.94     4.96      4.99       0.05 BCa    
#> 2 sqft        Residential   0.000210 0.000225  0.000239   0.05 BCa    
#> 3 (Intercept) Condo         4.47     4.59      4.74       0.05 BCa    
#> 4 sqft        Condo         0.000395 0.000520  0.000638   0.05 BCa    
#> 5 (Intercept) Multi_Family  4.64     5.25      5.62       0.05 BCa    
#> 6 sqft        Multi_Family -0.000156 0.0000344 0.000330   0.05 BCa

Created on 2024-01-19 with reprex v2.0.2

@topepo topepo requested a review from hfrick January 19, 2024 17:02
@topepo topepo marked this pull request as ready for review January 19, 2024 17:02
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally on board with this, go ahead with adding more unit tests 👍

R/bootci.R Show resolved Hide resolved
topepo added a commit to tidymodels/tune that referenced this pull request Jan 23, 2024
@topepo
Copy link
Member Author

topepo commented Jan 24, 2024

This is ready for final review.

I've set up the int_pctl() S3 method for tune_results objects to work with the current interval methods in rsample and with this change.

topepo added a commit to tidymodels/tune that referenced this pull request Jan 24, 2024
* add maybe_choose_eval_time

* re-write survival bits

* update/fix tests

* nocov

* added additional test

* version bump

* changes based on reviewer feedback

* updates to work with tidymodels/rsample#465

* update snapshots

---------

Co-authored-by: ‘topepo’ <‘[email protected]’>
@hfrick hfrick merged commit 617b619 into main Sep 12, 2024
12 checks passed
@hfrick hfrick deleted the expand-group-intervals branch September 12, 2024 12:44
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants