Skip to content

Rework vec_ptype() into a generic for S3 types #1322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 11, 2021

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Feb 6, 2021

Demonstrated benefits of a static vec-ptype:

(It is a bit oversimplified for year-month-day, but the performance improvements remains valid)

It would be up to the developer to ensure that their vec-ptype method returns a "valid" ptype. I think this is a reasonable requirement, since we essentially require that their vec-ptype2 and vec-cast methods are also valid.

library(clock)
library(vctrs)

x <- year_month_day(2019, 1, 1)
xs <- rep(list(x), 1e5)

bench::mark(vec_ptype(x), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_ptype(x)   14.3µs   15.2µs    62544.    29.8KB     21.9

bench::mark(vec_ptype_common(!!!xs))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression                   min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_ptype_common(!!!xs)    7.79s    7.79s     0.128      32KB     24.5

empty_year_month_day <- year_month_day(integer(), integer(), integer())

vec_ptype.clock_year_month_day <- function(x, ..., x_arg = "") {
  empty_year_month_day
}

bench::mark(vec_ptype(x), iterations = 100000)
#> # A tibble: 1 x 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_ptype(x)   1.83µs    2.4µs   388969.        0B     27.2

bench::mark(vec_ptype_common(!!!xs))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression                   min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_ptype_common(!!!xs)    1.29s    1.29s     0.773        0B     27.8

Created on 2021-02-24 by the reprex package (v1.0.0)

@DavisVaughan DavisVaughan force-pushed the feature/vec-ptype-generic branch from b149397 to 14b3c2c Compare February 25, 2021 13:24
@DavisVaughan
Copy link
Member Author

Possible benefits for integer64, which could be added in a followup PR

Compare against #1323 (comment)

  • Nearly 3x faster for common type determination
  • Which translates to roughly 40% increase for vec_c().
  • Still much slower than native types, but better than it used to be
library(vctrs, warn.conflicts = FALSE)

vctrs_empty_integer64 <- NULL

vec_ptype.integer64 <- function(x, ...) {
  if (is.null(vctrs_empty_integer64)) {
    vctrs_empty_integer64 <<- bit64::integer64()
  }
  
  out <- vctrs_empty_integer64
  dim <- dim(x)
  
  if (is.null(dim)) {
    return(out)
  }
  
  if (length(dim) == 0L) {
    abort("Corrupt `dim` attribute. `dim` must have length >=1.")
  }
  
  dim[[1]] <- 0L
  dim(out) <- dim
  
  out
}

library(dplyr, warn.conflicts = FALSE)
suppressPackageStartupMessages(library(bit64))
#> Warning: package 'bit64' was built under R version 4.0.2
#> Warning: package 'bit' was built under R version 4.0.2

df <- tibble(x = 1, x64 = integer64(1), xstr = "1", y = sample(10000, 1000000, replace = T)) %>% group_by(y)

first_i64 <- group_map(df, ~.x$x64[1])
first_int <- group_map(df, ~.x$x[1])

type_common <- function(x) vec_ptype_common(!!!x)
vec_c2 <- function(x) vec_c(!!!x)

bench::mark(
  type_common(first_int),
  vec_c2(first_int),
  type_common(first_i64),
  vec_c2(first_i64),
  
  check = FALSE,
  iterations = 20
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 4 x 6
#>   expression                  min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>             <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 type_common(first_int)  606.1µs 680.31µs  1402.       4.02KB      0  
#> 2 vec_c2(first_int)        1.88ms   2.27ms   458.     195.45KB      0  
#> 3 type_common(first_i64) 175.62ms 201.31ms     4.88    42.09KB     21.0
#> 4 vec_c2(first_i64)         1.19s    1.26s     0.795  764.06MB     12.2

@DavisVaughan DavisVaughan marked this pull request as ready for review February 25, 2021 13:48
@DavisVaughan DavisVaughan requested a review from lionel- February 25, 2021 13:50
R/type.R Outdated
@@ -39,6 +39,12 @@
#' See [internal-faq-ptype2-identity] for more information about
#' identity values.
#'
#' For performance, when developing a new S3 class you might want to override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main consequence of the "static prototype" argument is that names are no longer part of the prototype. This is something that matches my intuition and longer-term plans: #1020

For the patch release we could merge this to unblock Davis (performance of clock classes). Then at the next non-patch release we'll unname inputs before slicing in the default vec_ptype() method. Does that sound good @hadley?

@DavisVaughan DavisVaughan force-pushed the feature/vec-ptype-generic branch from abea16f to b999f25 Compare March 11, 2021 14:56
@DavisVaughan DavisVaughan merged commit c41295a into r-lib:master Mar 11, 2021
@DavisVaughan DavisVaughan deleted the feature/vec-ptype-generic branch March 11, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants