[Feature request] Optional caching of tests #1703

wurli · 2022-10-07T12:15:28Z

Why cache tests?

If you're working on a big package with a test suite that takes a long time to run, having an option to only run tests that call functions (directly or indirectly) where code has changed could potentially bring lengthy calls to test() down to only a few seconds.

How would this work?

You'd need to check whether any functions that are directly or indirectly called during a test have changed since the last time it was run. I can think of two possible implementations:

Method 1: Use code parsing

You would need to parse each function called during the test to get a dependency tree, then inspect this for changes. This opens a can of worms because extracting function calls from R code is hard. E.g. how do you generalise the detection of the function foo in lapply(foo, 1:10)?

Method 2: Modify functions before running `test_that()`

This would work something like this:

When test_that() gets run, every function in the package is modified so that it records its (original) body whenever it gets called:

# 1. `foo()` is defined somewhere in myPackage
foo <- function(x) x + 1

# 2. At the start of `test_that()`, a new environment is set up to record the definitions of any 
#    functions which are called during the test
called_funs <- new.env()

# 3. Something like this happens for each function in myPackage - the function is modified so 
#    that, when called, it saves its (original) body to the `called_funs` environment
body(foo) <- as.call(c(
  as.symbol("{"), 
  eval(bquote(expression(
    called_funs$foo <- .(deparse(body(foo)))
  ))),
  body(foo)
))

At the end of the test, called_funs will contain the bodies of all the functions in myPackage which have been called, directly or indirectly, during the test. (NB, you'd also want to record function arguments but that's omitted for the sake of brevity).
This list of functions definitions can then be checked for changes each time the test gets called, e.g. using snapshotting. If the snapshot passes (and the code for the test itself hasn't changed), the full test doesn't need to be run.

Possible issues

I can see two potential issues here:

Modifying the definition for foo as outlined above breaks something. I can't think of any reason why this would happen, but there's a lot of weird R code out there. This could be handled by making the caching opt-in.
Functions don't change superficially, but some dependency, e.g. some C++ code, does. Again, I think this could be handled by making the caching opt-in, or maybe opt-out for specific tests.

NB, this issue was prompted by a discussion on Twitter with @moodymudskipper.

As ever, thanks for the hard work on this amazing package.

The text was updated successfully, but these errors were encountered:

hadley · 2022-10-07T13:03:06Z

Our general advice for this is to break your tests up into files corresponding to your R code, and then just run the test file related to the R file you're currently working on using devtools::test_active_file().

wurli · 2022-10-08T23:01:19Z

Thanks for your response. My feeling is that, while doing things as you advise is ideal (not least because it forces you to structure your code thoughtfully), there's a set of problems which caching would solve, mainly because a lot of code isn't structured well and doesn't lend itself to the ideal approach.

E.g. someone inheriting a big, poorly written package might find that caching would substantially reduce the cognitive load of refactoring the codebase, particularly if many tests are lengthy and many functions need to be moved to new scripts. I can think of a few examples in my own job where this would be the case. This basically boils down to the fact that, if you're working on a change which affects more than one file, running test_active_file() repeatedly can quickly become a bit cumbersome.

Another nice thing about caching would be that test(cache = TRUE) should theoretically do the same thing as test_active_file() if you've kept your changes minimal. If tests from other files get run, it's a nice, early indication that you've changed something you didn't mean to.

hadley · 2022-10-10T12:51:24Z

If your tests take too long to run, I think you are better off refactoring them to make them faster. IMO doing anything automatically with caching is extremely likely to be a net negative, because you will get intermittent problems due to not getting the cache invalidation 100% correct.

moodymudskipper · 2022-10-10T13:33:53Z

FWIW I use the workflow you recommend, and I was using it at the exact time I laid down this idea. devtools::test_active_file() and testing with cache are 2 workarounds that might result in false negatives (makes you feel it's green when it's not).
The latter will lead to less of those though, and it might come with a message or warning. I would not like to use the cache by default. The workflow I have in mind is test with cache before commit (or in between), test without cache for PR (or let the CI do those), in projects with long tests.
Refactoring tests might be time consuming, and if you have 200 tests spanning from 0.1 to 3 seconds, it's hard to find the right ones to tackle and get motivated to attempt refactoring.
A consequence of that might be that users might refrain from writing tests, or might not test their changes as often, this has a cost too.

Regarding implementation Jim shared on twitter that {covr} might do a fair chunk of the work already : https://covr.r-lib.org/reference/covr.record_tests.html

hadley · 2022-10-10T15:48:54Z

Sorry not to be clear, I really don't think this is a good idea and it's not something that I think is a good fit for testthat. I'd suggest implementing it in another package or explicitly caching your tests yourself.

wurli · 2022-10-10T16:54:37Z

Absolutely no worries. Thanks for your comments/consideration 🙂

krlmlr · 2023-03-23T07:59:36Z

I agree with the general sentiment that each *.R source file should have a corresponding test-*.R test file, this makes development easier in the long run. I have a snippet that creates boilerplate test files that follow this structure: https://gist.github.com/krlmlr/262b7df52839de6f0dbad33a69824247 . Eventually, this can be upgraded to synchronize source files and tests as new functions are added.

r-lib/pillar@19cf68b (#640) from r-lib/pillar#640 shows this snippet in action. The tests are failing initially because most functions require arguments. Now I "only" need to go through all the new test files and find existing good tests or write new tests to correspond to this structure.

For projects where this isn't feasible right now, I have a small package that reruns only failing tests: https://github.com/krlmlr/lazytest.

hadley closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Optional caching of tests #1703

[Feature request] Optional caching of tests #1703

wurli commented Oct 7, 2022 •

edited

Loading

hadley commented Oct 7, 2022

wurli commented Oct 8, 2022

hadley commented Oct 10, 2022

moodymudskipper commented Oct 10, 2022 •

edited

Loading

hadley commented Oct 10, 2022

wurli commented Oct 10, 2022

krlmlr commented Mar 23, 2023

[Feature request] Optional caching of tests #1703

[Feature request] Optional caching of tests #1703

Comments

wurli commented Oct 7, 2022 • edited Loading

Why cache tests?

How would this work?

Method 1: Use code parsing

Method 2: Modify functions before running test_that()

Possible issues

hadley commented Oct 7, 2022

wurli commented Oct 8, 2022

hadley commented Oct 10, 2022

moodymudskipper commented Oct 10, 2022 • edited Loading

hadley commented Oct 10, 2022

wurli commented Oct 10, 2022

krlmlr commented Mar 23, 2023

wurli commented Oct 7, 2022 •

edited

Loading

Method 2: Modify functions before running `test_that()`

moodymudskipper commented Oct 10, 2022 •

edited

Loading