Skip to content

Commit

Permalink
Vectorise replacement function (#462)
Browse files Browse the repository at this point in the history
This is a breaking change but radically improves performance so I think it's worth it.
  • Loading branch information
hadley authored Jul 16, 2024
1 parent 6f08d86 commit 5a96ff0
Show file tree
Hide file tree
Showing 10 changed files with 2,163 additions and 1,381 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# stringr (development version)

* In `str_replace_all()`, a `replacement` function now receives all values in
a single vector. This radically improves performance at the cost of breaking
some existing uses (#462).

# stringr 1.5.1

* Some minor documentation improvements.
Expand Down
68 changes: 57 additions & 11 deletions R/replace.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@
#' References of the form `\1`, `\2`, etc will be replaced with
#' the contents of the respective matched group (created by `()`).
#'
#' Alternatively, supply a function, which will be called once for each
#' match (from right to left) and its return value will be used to replace
#' the match.
#' Alternatively, supply a function (or formula): it will be passed a single
#' character vector and should return a character vector of the same length.
#'
#' To replace the complete string with `NA`, use
#' `replacement = NA_character_`.
#' @return A character vector the same length as
#' `string`/`pattern`/`replacement`.
#' @seealso [str_replace_na()] to turn missing values into "NA";
Expand Down Expand Up @@ -56,7 +58,7 @@
#' colours <- str_c("\\b", colors(), "\\b", collapse="|")
#' col2hex <- function(col) {
#' rgb <- col2rgb(col)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
#' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], maxColorValue = 255)
#' }
#'
#' x <- c(
Expand Down Expand Up @@ -180,18 +182,62 @@ str_replace_na <- function(string, replacement = "NA") {

str_transform <- function(string, pattern, replacement) {
loc <- str_locate(string, pattern)
str_sub(string, loc, omit_na = TRUE) <- replacement(str_sub(string, loc))
new <- replacement(str_sub(string, loc))
str_sub(string, loc, omit_na = TRUE) <- new
string
}
str_transform_all <- function(string, pattern, replacement) {

str_transform_all <- function(string, pattern, replacement, error_call = caller_env()) {
locs <- str_locate_all(string, pattern)

for (i in seq_along(string)) {
for (j in rev(seq_len(nrow(locs[[i]])))) {
loc <- locs[[i]]
str_sub(string[[i]], loc[j, 1], loc[j, 2]) <- replacement(str_sub(string[[i]], loc[j, 1], loc[j, 2]))
}
old <- str_sub_all(string, locs)

# unchop list into a vector, apply replacement(), and then rechop back into
# a list
old_flat <- vctrs::list_unchop(old)
if (length(old_flat) == 0) {
# minor optimisation to avoid problems with the many replacement
# functions that use paste
new_flat <- character()
} else {
withCallingHandlers(
new_flat <- replacement(old_flat),
error = function(cnd) {
cli::cli_abort(
c(
"Failed to apply {.arg replacement} function.",
i = "It must accept a character vector of any length."
),
parent = cnd,
call = error_call
)
}
)
}

if (!is.character(new_flat)) {
cli::cli_abort(
"Function {.arg replacement} must return a character vector, not {.obj_type_friendly {new_flat}}.",
call = error_call
)
}
if (length(new_flat) != length(old_flat)) {
cli::cli_abort(
"Function {.arg replacement} must return a vector the same length as the input ({length(old_flat)}), not length {length(new_flat)}.",
call = error_call
)
}

idx <- chop_index(old)
new <- vctrs::vec_chop(new_flat, idx)

stringi::stri_sub_all(string, locs) <- new
string
}

chop_index <- function(x) {
ls <- lengths(x)
start <- cumsum(c(1L, ls[-length(ls)]))
end <- start + ls - 1L
lapply(seq_along(ls), function(i) seq2(start[[i]], end[[i]]))
}
12 changes: 9 additions & 3 deletions R/view.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ str_view_filter <- function(x, pattern, match) {

str_view_highlighter <- function(html = TRUE) {
if (html) {
function(x) paste0("<span class='match'>", x, "</span>")
function(x) str_c("<span class='match'>", x, "</span>")
} else {
function(x) {
out <- cli::col_cyan("<", x, ">")
Expand All @@ -123,9 +123,15 @@ str_view_highlighter <- function(html = TRUE) {

str_view_special <- function(x, html = TRUE) {
if (html) {
replace <- function(x) paste0("<span class='special'>", x, "</span>")
replace <- function(x) str_c("<span class='special'>", x, "</span>")
} else {
replace <- function(x) cli::col_cyan("{", stri_escape_unicode(x), "}")
replace <- function(x) {
if (length(x) == 0) {
return(character())
}

cli::col_cyan("{", stri_escape_unicode(x), "}")
}
}

# Highlight any non-standard whitespace characters
Expand Down
10 changes: 6 additions & 4 deletions man/str_replace.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

89 changes: 47 additions & 42 deletions revdep/README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,53 @@
# Revdeps

## Failed to check (24)
## Failed to check (34)

|package |version |error |warning |note |
|:-------------|:-------|:-----|:-------|:----|
|admiral |0.8.4 |1 | | |
|admiralonco |0.1.0 |1 | | |
|NA |? | | | |
|NA |? | | | |
|NA |? | | | |
|NA |? | | | |
|NA |? | | | |
|genekitr |? | | | |
|ggPMX |? | | | |
|NA |? | | | |
|NA |? | | | |
|NA |? | | | |
|MARVEL |? | | | |
|numbat |? | | | |
|OlinkAnalyze |? | | | |
|Platypus |? | | | |
|RVA |? | | | |
|NA |? | | | |
|NA |? | | | |
|tidySEM |? | | | |
|NA |? | | | |
|tinyarray |? | | | |
|NA |? | | | |
|xpose.nlmixr2 |? | | | |
|package |version |error |warning |note |
|:--------------|:----------|:-----|:-------|:----|
|AovBay |0.1.0 |1 | | |
|apollo |0.3.3 |1 | | |
|arealDB |0.6.3 |1 | | |
|BayesFactor |0.9.12-4.7 |1 | | |
|CGPfunctions |0.6.3 |1 | | |
|covid19india |0.1.4 |1 | | |
|CSCNet |0.1.2 |1 | | |
|dataone |2.2.2 |1 | | |
|FAIRmaterials |0.4.2.1 |1 | | |
|gmoTree |? | | | |
|Greg |2.0.2 |1 | | |
|interactionRCS |0.1.1 |1 | | |
|iNZightPlots |2.15.3 |1 | | |
|metajam |0.3.0 |1 | | |
|miceafter |0.5.0 |1 | | |
|Multiaovbay |0.1.0 |1 | | |
|ontologics |0.7.0 |1 | | |
|popstudy |1.0.1 |1 | | |
|pre |1.0.7 |1 | | |
|psfmi |1.4.0 |1 | | |
|qris |1.1.1 |1 | | |
|QTOCen |0.1.1 |1 | | |
|quantoptr |0.1.3 |1 | | |
|quid |0.0.1 |1 | | |
|rlme |0.5 |1 | | |
|scaper |0.1.0 |1 | | |
|scCustomize |2.1.2 |1 | |1 |
|scpi |2.2.5 |1 | | |
|scRNAstat |0.1.1 |1 | | |
|SEERaBomb |2019.2 |1 | | |
|seqimpute |2.0.0 |1 | | |
|SimplyAgree |0.2.0 |1 | | |
|tidyseurat |0.8.0 |1 | | |
|weightQuant |1.0.1 |1 | | |

## New problems (12)
## New problems (7)

|package |version |error |warning |note |
|:-------------|:--------|:------|:-------|:----|
|[cmcR](problems.md#cmcr)|0.1.9 |__+1__ | | |
|[crispRdesignR](problems.md#crisprdesignr)|1.1.6 |__+1__ | |2 |
|[cspp](problems.md#cspp)|0.3.2 |__+1__ |__+1__ | |
|[flair](problems.md#flair)|0.0.2 |__+2__ |__+1__ |2 |
|[GetLattesData](problems.md#getlattesdata)|1.4.1 |__+2__ |__+1__ | |
|[glmmPen](problems.md#glmmpen)|1.5.1.10 |__+1__ | |2 |
|[mpwR](problems.md#mpwr)|0.1.0 |__+2__ |__+1__ | |
|[postpack](problems.md#postpack)|0.5.3 |__+1__ |__+1__ |1 |
|[repr](problems.md#repr)|1.1.4 |__+1__ | |2 |
|[tidyfst](problems.md#tidyfst)|1.7.5 |__+1__ | |1 |
|[tidyft](problems.md#tidyft)|0.4.5 |__+1__ | |2 |
|[zipangu](problems.md#zipangu)|0.3.1 |__+2__ | |1 |
|package |version |error |warning |note |
|:-----------|:-------|:--------|:-------|:------|
|[huxtable](problems.md#huxtable)|5.5.6 |1 __+1__ | |2 |
|[latex2exp](problems.md#latex2exp)|0.9.6 |__+2__ | |__+1__ |
|[phenofit](problems.md#phenofit)|0.3.9 |__+2__ | |1 |
|[priceR](problems.md#pricer)|1.0.1 |__+1__ | |1 |
|[PubChemR](problems.md#pubchemr)|2.0 |__+2__ | | |
|[rtiddlywiki](problems.md#rtiddlywiki)|0.1.0 |__+1__ | | |
|[salty](problems.md#salty)|0.1.0 |__+2__ | |1 |

92 changes: 47 additions & 45 deletions revdep/cran.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,73 @@
## revdepcheck results

We checked 1675 reverse dependencies (1663 from CRAN + 12 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.
We checked 2023 reverse dependencies (2022 from CRAN + 1 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.

* We saw 12 new problems
* We failed to check 12 packages
* We saw 7 new problems
* We failed to check 33 packages

Issues with CRAN packages are summarised below.

### New problems
(This reports the first line of each new failure)

* cmcR
checking tests ... ERROR

* crispRdesignR
checking examples ... ERROR

* cspp
checking examples ... ERROR
checking re-building of vignette outputs ... WARNING

* flair
checking examples ... ERROR
checking tests ... ERROR
checking re-building of vignette outputs ... WARNING

* GetLattesData
* huxtable
checking examples ... ERROR
checking tests ... ERROR
checking re-building of vignette outputs ... WARNING

* glmmPen
* latex2exp
checking tests ... ERROR
checking running R code from vignettes ... ERROR
checking re-building of vignette outputs ... NOTE

* mpwR
* phenofit
checking examples ... ERROR
checking tests ... ERROR
checking re-building of vignette outputs ... WARNING

* postpack
* priceR
checking examples ... ERROR
checking re-building of vignette outputs ... WARNING

* repr
* PubChemR
checking tests ... ERROR
checking running R code from vignettes ... ERROR

* tidyfst
checking examples ... ERROR

* tidyft
checking examples ... ERROR
* rtiddlywiki
checking tests ... ERROR

* zipangu
* salty
checking examples ... ERROR
checking tests ... ERROR

### Failed to check

* admiral (NA)
* admiralonco (NA)
* genekitr (NA)
* ggPMX (NA)
* MARVEL (NA)
* numbat (NA)
* OlinkAnalyze (NA)
* Platypus (NA)
* RVA (NA)
* tidySEM (NA)
* tinyarray (NA)
* xpose.nlmixr2 (NA)
* AovBay (NA)
* apollo (NA)
* arealDB (NA)
* BayesFactor (NA)
* CGPfunctions (NA)
* covid19india (NA)
* CSCNet (NA)
* dataone (NA)
* FAIRmaterials (NA)
* Greg (NA)
* interactionRCS (NA)
* iNZightPlots (NA)
* metajam (NA)
* miceafter (NA)
* Multiaovbay (NA)
* ontologics (NA)
* popstudy (NA)
* pre (NA)
* psfmi (NA)
* qris (NA)
* QTOCen (NA)
* quantoptr (NA)
* quid (NA)
* rlme (NA)
* scaper (NA)
* scCustomize (NA)
* scpi (NA)
* scRNAstat (NA)
* SEERaBomb (NA)
* seqimpute (NA)
* SimplyAgree (NA)
* tidyseurat (NA)
* weightQuant (NA)
Loading

0 comments on commit 5a96ff0

Please sign in to comment.