-
Notifications
You must be signed in to change notification settings - Fork 19
Description
An original design choice we made in Tplyr was to run all of the data processing within the table or layer environments themselves. At the time, this felt like it made sense - and there may have been a slight obsession with meta programming that lead to this, but it created a fairly complicated code architecture that was fairly riddled with unexpected effects. Evaluating code within the object environments themselves means that anything that happens within the function call effectively impacts the Tplyr object itself.
Before: evalq() Pattern
Previously, internal functions executed entire function bodies within table/layer environments using evalq():
treatment_group_build <- function(table) {
output <- evalq({
# Entire function body executes in table environment
built_target <- clean_attr(target)
fct_levels <- unique(...) # Pollutes table environment
# ... more code ...
rm(grp_i, i, fct_levels) # Manual cleanup required
}, envir=table)
invisible(table)
}After: Extract-Process-Bind Pattern
Now, functions explicitly extract, process, and bind:
treatment_group_build <- function(table) {
# EXTRACT: Explicitly get what we need
target <- table$target
treat_var <- table$treat_var
# PROCESS: Work in function environment
built_target <- clean_attr(target)
fct_levels <- unique(...) # Local variable
# BIND: Explicitly write results back
table$built_target <- built_target
table$built_pop_data <- built_pop_data
invisible(table)
}This isn't really a performance focused issue, but rather a goal of updating the code base to be slightly more intuitive, but also use a design pattern that should be:
a) More digestible for new developers trying to update the code.
b) More intentional in the impacts to the Tplyr objects, and
c) More predictable in the impact and effects of updates to the code.