Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorise replacement function #462

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@

* `regex()` and friends now generate class names with `stringr_` prefix (#384).

* In `str_replace_all()`, a `replacement` function now receives all values in
a single vector. This radically improves performance at the cost of breaking
some existing uses (#462).

## New features

* `str_view()` will use ANSI colouring if available (#370). This works in more
Expand Down
52 changes: 42 additions & 10 deletions R/replace.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
#' To perform multiple replacements in each element of `string`,
#' pass a named vector (`c(pattern1 = replacement1)`) to
#' `str_replace_all`. Alternatively, pass a function (or formula) to
#' `replacement`: it will be called once for each match (from right to left)
#' and its return value will be used to replace the match.
#' `replacement`: it will be passed a single character vector and should
#' return a character vector of the same length.
#'
#' To replace the complete string with `NA`, use
#' `replacement = NA_character_`.
Expand Down Expand Up @@ -55,7 +55,7 @@
#' colours <- str_c("\\b", colors(), "\\b", collapse="|")
#' col2hex <- function(col) {
#' rgb <- col2rgb(col)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], maxColorValue = 255)
#' }
#'
#' x <- c(
Expand Down Expand Up @@ -179,18 +179,50 @@ str_replace_na <- function(string, replacement = "NA") {

str_transform <- function(string, pattern, replacement) {
loc <- str_locate(string, pattern)
str_sub(string, loc, omit_na = TRUE) <- replacement(str_sub(string, loc))
new <- replacement(str_sub(string, loc))
str_sub(string, loc, omit_na = TRUE) <- new
string
}
str_transform_all <- function(string, pattern, replacement) {

str_transform_all <- function(string, pattern, replacement, error_call = caller_env()) {
locs <- str_locate_all(string, pattern)

for (i in seq_along(string)) {
for (j in rev(seq_len(nrow(locs[[i]])))) {
loc <- locs[[i]]
str_sub(string[[i]], loc[j, 1], loc[j, 2]) <- replacement(str_sub(string[[i]], loc[j, 1], loc[j, 2]))
}
old <- str_sub_all(string, locs)
idx <- chop_index(old)

# unchop list into a vector, apply replacement(), and then rechop back into
# a list
old_flat <- vctrs::vec_unchop(old, idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how much you value readability over performance, you don't actually need to pass idx here since you are unchopping in the order the input was provided

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dev vctrs:

xs <- list("a", c("b", "c"), c("d", "e", "f"))

x <- list_unchop(xs)
x <- toupper(x)
x
#> [1] "A" "B" "C" "D" "E" "F"

sizes <- list_sizes(xs)
vec_partition(x, sizes)

if (length(old_flat) == 0) {
# minor optimisation to avoid problems with the many replacement
# functions that use paste
new_flat <- character()
Comment on lines +199 to +201
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate idea is to actually apply replacement() and then check if length(new_flat) == 1 && length(old_flat) == 0. In that special case you could issue a warning (possibly even mentioning paste0()) and "fix" it for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do that, I just don't think it's worth the effort since we do know what the correct answer is in the 0 element case.

} else {
new_flat <- replacement(old_flat)
}

if (!is.character(new_flat)) {
cli::cli_abort(
"{.fn replacement} must return a character vector, not {.obj_type_friendly {new_flat}}.",
call = error_call
)
}
if (length(new_flat) != length(old_flat)) {
cli::cli_abort(
"{.fn replacement} must return a vector the same length as the input ({length(old_flat)}), not length {length(new_flat)}.",
call = error_call
)
}

new <- vctrs::vec_chop(new_flat, idx)

stringi::stri_sub_all(string, locs) <- new
string
}

chop_index <- function(x) {
ls <- lengths(x)
start <- cumsum(c(1, ls[-length(ls)]))
end <- start + ls - 1
hadley marked this conversation as resolved.
Show resolved Hide resolved
lapply(seq_along(ls), function(i) seq2(start[[i]], end[[i]]))
}
16 changes: 13 additions & 3 deletions R/view.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,13 @@ str_view_filter <- function(x, pattern, match) {

str_view_highlighter <- function(html = TRUE) {
if (html) {
function(x) paste0("<span class='match'>", x, "</span>")
function(x) str_c("<span class='match'>", x, "</span>")
} else {
function(x) {
if (length(x) == 0) {
return(character())
}
hadley marked this conversation as resolved.
Show resolved Hide resolved

out <- cli::col_cyan("<", x, ">")

# Ensure styling is starts and ends within each line
Expand All @@ -123,9 +127,15 @@ str_view_highlighter <- function(html = TRUE) {

str_view_special <- function(x, html = TRUE) {
if (html) {
replace <- function(x) paste0("<span class='special'>", x, "</span>")
replace <- function(x) str_c("<span class='special'>", x, "</span>")
} else {
replace <- function(x) cli::col_cyan("{", stri_escape_unicode(x), "}")
replace <- function(x) {
if (length(x) == 0) {
return(character())
}

cli::col_cyan("{", stri_escape_unicode(x), "}")
}
}

# Highlight any non-standard whitespace characters
Expand Down
6 changes: 3 additions & 3 deletions man/str_replace.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 10 additions & 2 deletions revdep/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
# Revdeps

## New problems (13)
## New problems (21)

|package |version |error |warning |note |
|:-------------|:--------|:------|:-------|:----|
|[arrow](problems.md#arrow)|9.0.0.2 |__+1__ | |2 |
|[autostats](problems.md#autostats)|0.3.1 |__+1__ |__+1__ | |
|[bdpar](problems.md#bdpar)|3.0.3 |__+1__ | |1 |
|[cmcR](problems.md#cmcr)|0.1.9 |__+1__ | | |
|[crispRdesignR](problems.md#crisprdesignr)|1.1.6 |__+1__ | |2 |
|[doseminer](problems.md#doseminer)|0.1.2 |__+1__ |__+1__ | |
|[flair](problems.md#flair)|0.0.2 |__+2__ |__+1__ |2 |
|[gginnards](problems.md#gginnards)|0.1.0-1 |__+1__ |__+1__ | |
|[glmmPen](problems.md#glmmpen)|1.5.1.10 |__+1__ | |2 |
|[gmgm](problems.md#gmgm)|1.1.2 |__+1__ | | |
|[hmer](problems.md#hmer)|1.0.1 |__+1__ | | |
|[hockeyR](problems.md#hockeyr)|1.2.0 | |__+1__ | |
|[huxtable](problems.md#huxtable)|5.5.0 |__+1__ |1 |1 |
|[latex2exp](problems.md#latex2exp)|0.9.5 |__+1__ |__+1__ | |
|[priceR](problems.md#pricer)|0.1.67 |__+1__ | |2 |
|[repr](problems.md#repr)|1.1.4 |__+1__ | |2 |
|[rtiddlywiki](problems.md#rtiddlywiki)|0.1.0 |__+1__ | | |
|[salty](problems.md#salty)|0.1.0 |__+2__ | |1 |
|[strex](problems.md#strex)|1.4.3 |__+1__ |__+1__ | |
|[TiPS](problems.md#tips)|1.2.1 |__+1__ |__+1__ |1 |
|[wordpredictor](problems.md#wordpredictor)|0.0.3 |__+2__ |__+1__ | |
Expand Down
33 changes: 30 additions & 3 deletions revdep/cran.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## revdepcheck results

We checked 17 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
We checked 36 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.

* We saw 13 new problems
* We saw 21 new problems
* We failed to check 0 packages

Issues with CRAN packages are summarised below.
Expand All @@ -17,12 +17,16 @@ Issues with CRAN packages are summarised below.
checking examples ... ERROR
checking re-building of vignette outputs ... WARNING

* bdpar
* cmcR
checking tests ... ERROR

* crispRdesignR
checking examples ... ERROR

* doseminer
checking examples ... ERROR
checking re-building of vignette outputs ... WARNING

* flair
checking examples ... ERROR
checking tests ... ERROR
Expand All @@ -38,9 +42,32 @@ Issues with CRAN packages are summarised below.
* gmgm
checking tests ... ERROR

* hmer
checking tests ... ERROR

* hockeyR
checking re-building of vignette outputs ... WARNING

* huxtable
checking examples ... ERROR

* latex2exp
checking tests ... ERROR
checking re-building of vignette outputs ... WARNING

* priceR
checking examples ... ERROR

* repr
checking tests ... ERROR

* rtiddlywiki
checking tests ... ERROR

* salty
checking examples ... ERROR
checking tests ... ERROR

* strex
checking tests ... ERROR
checking re-building of vignette outputs ... WARNING
Expand Down
Loading