Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorise replacement function #462

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# stringr (development version)

* In `str_replace_all()`, a `replacement` function now receives all values in
a single vector. This radically improves performance at the cost of breaking
some existing uses (#462).

* `str_trunc()` now correctly truncates strings when `side` is `"left"` or
`"center"` (@UchidaMizuki, #512).

Expand Down
56 changes: 45 additions & 11 deletions R/replace.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@
#' References of the form `\1`, `\2`, etc will be replaced with
#' the contents of the respective matched group (created by `()`).
#'
#' Alternatively, supply a function, which will be called once for each
#' match (from right to left) and its return value will be used to replace
#' the match.
#' Alternatively, supply a function (or formula): it will be passed a single
#' character vector and should return a character vector of the same length.
#'
#' To replace the complete string with `NA`, use
#' `replacement = NA_character_`.
#' @return A character vector the same length as
#' `string`/`pattern`/`replacement`.
#' @seealso [str_replace_na()] to turn missing values into "NA";
Expand Down Expand Up @@ -55,7 +57,7 @@
#' colours <- str_c("\\b", colors(), "\\b", collapse="|")
#' col2hex <- function(col) {
#' rgb <- col2rgb(col)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], maxColorValue = 255)
#' }
#'
#' x <- c(
Expand Down Expand Up @@ -179,18 +181,50 @@ str_replace_na <- function(string, replacement = "NA") {

str_transform <- function(string, pattern, replacement) {
loc <- str_locate(string, pattern)
str_sub(string, loc, omit_na = TRUE) <- replacement(str_sub(string, loc))
new <- replacement(str_sub(string, loc))
str_sub(string, loc, omit_na = TRUE) <- new
string
}
str_transform_all <- function(string, pattern, replacement) {

str_transform_all <- function(string, pattern, replacement, error_call = caller_env()) {
locs <- str_locate_all(string, pattern)

for (i in seq_along(string)) {
for (j in rev(seq_len(nrow(locs[[i]])))) {
loc <- locs[[i]]
str_sub(string[[i]], loc[j, 1], loc[j, 2]) <- replacement(str_sub(string[[i]], loc[j, 1], loc[j, 2]))
}
old <- str_sub_all(string, locs)
idx <- chop_index(old)

# unchop list into a vector, apply replacement(), and then rechop back into
# a list
old_flat <- vctrs::list_unchop(old, indices = idx)
if (length(old_flat) == 0) {
# minor optimisation to avoid problems with the many replacement
# functions that use paste
new_flat <- character()
Comment on lines +199 to +201
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate idea is to actually apply replacement() and then check if length(new_flat) == 1 && length(old_flat) == 0. In that special case you could issue a warning (possibly even mentioning paste0()) and "fix" it for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do that, I just don't think it's worth the effort since we do know what the correct answer is in the 0 element case.

} else {
new_flat <- replacement(old_flat)
}

if (!is.character(new_flat)) {
cli::cli_abort(
"Function {.arg replacement} must return a character vector, not {.obj_type_friendly {new_flat}}.",
call = error_call
)
}
if (length(new_flat) != length(old_flat)) {
cli::cli_abort(
"Function {.arg replacement} must return a vector the same length as the input ({length(old_flat)}), not length {length(new_flat)}.",
call = error_call
)
}

new <- vctrs::vec_chop(new_flat, idx)

stringi::stri_sub_all(string, locs) <- new
string
}

chop_index <- function(x) {
ls <- lengths(x)
start <- cumsum(c(1L, ls[-length(ls)]))
end <- start + ls - 1L
lapply(seq_along(ls), function(i) seq2(start[[i]], end[[i]]))
}
12 changes: 9 additions & 3 deletions R/view.R
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ str_view_filter <- function(x, pattern, match) {

str_view_highlighter <- function(html = TRUE) {
if (html) {
function(x) paste0("<span class='match'>", x, "</span>")
function(x) str_c("<span class='match'>", x, "</span>")
} else {
function(x) {
out <- cli::col_cyan("<", x, ">")
Expand All @@ -125,9 +125,15 @@ str_view_highlighter <- function(html = TRUE) {

str_view_special <- function(x, html = TRUE) {
if (html) {
replace <- function(x) paste0("<span class='special'>", x, "</span>")
replace <- function(x) str_c("<span class='special'>", x, "</span>")
} else {
replace <- function(x) cli::col_cyan("{", stri_escape_unicode(x), "}")
replace <- function(x) {
if (length(x) == 0) {
return(character())
}

cli::col_cyan("{", stri_escape_unicode(x), "}")
}
}

# Highlight any non-standard whitespace characters
Expand Down
10 changes: 6 additions & 4 deletions man/str_replace.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions tests/testthat/_snaps/replace.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,19 @@
Error in `str_replace_all()`:
! `pattern` can't be a boundary.

# replacement function must return correct type/length

Code
str_replace_all("x", "x", ~1)
Condition
Error in `str_replace_all()`:
! Function `replacement` must return a character vector, not a number.
Code
str_replace_all("x", "x", ~ c("a", "b"))
Condition
Error in `str_replace_all()`:
! Function `replacement` must return a vector the same length as the input (1), not length 2.

# backrefs are correctly translated

Code
Expand Down
35 changes: 30 additions & 5 deletions tests/testthat/test-replace.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,48 @@ test_that("can't replace empty/boundary", {

# functions ---------------------------------------------------------------

test_that("can supply replacement function", {
test_that("can replace multiple values", {
expect_equal(str_replace("abc", "a|c", toupper), "Abc")
expect_equal(str_replace_all("abc", "a|c", toupper), "AbC")
})

test_that("can use formula", {
expect_equal(str_replace("abc", "b", ~ "x"), "axc")
expect_equal(str_replace_all("abc", "b", ~ "x"), "axc")
})

test_that("replacement can be different length", {
double <- function(x) str_dup(x, 2)
expect_equal(str_replace_all("abc", "a|c", double), "aabcc")
})

test_that("replacement with NA works", {
test_that("replacement is vectorised", {
x <- c("", "a", "b", "ab", "abc", "cba")
expect_equal(
str_replace_all(x, "a|c", ~ toupper(str_dup(.x, 2))),
c("", "AA", "b", "AAb", "AAbCC", "CCbAA")
)
})

test_that("is forgiving of 0 matches with paste", {
x <- c("a", "b", "c")
expect_equal(str_replace_all(x, "d", ~ paste("x", .x)), x)
})

test_that("works with no match", {
expect_equal(str_replace("abc", "z", toupper), "abc")
})

test_that("can use formula", {
expect_equal(str_replace("abc", "b", ~ "x"), "axc")
expect_equal(str_replace_all("abc", "b", ~ "x"), "axc")
test_that("works with zero length match", {
expect_equal(str_replace("abc", "$", toupper), "abc")
expect_equal(str_replace_all("abc", "$|^", ~ rep("X", length(.x))), "XabcX")
})

test_that("replacement function must return correct type/length", {
expect_snapshot(error = TRUE, {
str_replace_all("x", "x", ~ 1)
str_replace_all("x", "x", ~ c("a", "b"))
})
})

# fix_replacement ---------------------------------------------------------
Expand Down
Loading