Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Optional caching of tests #1703

Closed
wurli opened this issue Oct 7, 2022 · 7 comments
Closed

[Feature request] Optional caching of tests #1703

wurli opened this issue Oct 7, 2022 · 7 comments

Comments

@wurli
Copy link

wurli commented Oct 7, 2022

Why cache tests?

If you're working on a big package with a test suite that takes a long time to run, having an option to only run tests that call functions (directly or indirectly) where code has changed could potentially bring lengthy calls to test() down to only a few seconds.

How would this work?

You'd need to check whether any functions that are directly or indirectly called during a test have changed since the last time it was run. I can think of two possible implementations:

Method 1: Use code parsing

You would need to parse each function called during the test to get a dependency tree, then inspect this for changes. This opens a can of worms because extracting function calls from R code is hard. E.g. how do you generalise the detection of the function foo in lapply(foo, 1:10)?

Method 2: Modify functions before running test_that()

This would work something like this:

  1. When test_that() gets run, every function in the package is modified so that it records its (original) body whenever it gets called:

    # 1. `foo()` is defined somewhere in myPackage
    foo <- function(x) x + 1
    
    # 2. At the start of `test_that()`, a new environment is set up to record the definitions of any 
    #    functions which are called during the test
    called_funs <- new.env()
    
    # 3. Something like this happens for each function in myPackage - the function is modified so 
    #    that, when called, it saves its (original) body to the `called_funs` environment
    body(foo) <- as.call(c(
      as.symbol("{"), 
      eval(bquote(expression(
        called_funs$foo <- .(deparse(body(foo)))
      ))),
      body(foo)
    ))
  2. At the end of the test, called_funs will contain the bodies of all the functions in myPackage which have been called, directly or indirectly, during the test. (NB, you'd also want to record function arguments but that's omitted for the sake of brevity).

  3. This list of functions definitions can then be checked for changes each time the test gets called, e.g. using snapshotting. If the snapshot passes (and the code for the test itself hasn't changed), the full test doesn't need to be run.

Possible issues

I can see two potential issues here:

  1. Modifying the definition for foo as outlined above breaks something. I can't think of any reason why this would happen, but there's a lot of weird R code out there. This could be handled by making the caching opt-in.

  2. Functions don't change superficially, but some dependency, e.g. some C++ code, does. Again, I think this could be handled by making the caching opt-in, or maybe opt-out for specific tests.

NB, this issue was prompted by a discussion on Twitter with @moodymudskipper.

As ever, thanks for the hard work on this amazing package.

@hadley
Copy link
Member

hadley commented Oct 7, 2022

Our general advice for this is to break your tests up into files corresponding to your R code, and then just run the test file related to the R file you're currently working on using devtools::test_active_file().

@wurli
Copy link
Author

wurli commented Oct 8, 2022

Thanks for your response. My feeling is that, while doing things as you advise is ideal (not least because it forces you to structure your code thoughtfully), there's a set of problems which caching would solve, mainly because a lot of code isn't structured well and doesn't lend itself to the ideal approach.

E.g. someone inheriting a big, poorly written package might find that caching would substantially reduce the cognitive load of refactoring the codebase, particularly if many tests are lengthy and many functions need to be moved to new scripts. I can think of a few examples in my own job where this would be the case. This basically boils down to the fact that, if you're working on a change which affects more than one file, running test_active_file() repeatedly can quickly become a bit cumbersome.

Another nice thing about caching would be that test(cache = TRUE) should theoretically do the same thing as test_active_file() if you've kept your changes minimal. If tests from other files get run, it's a nice, early indication that you've changed something you didn't mean to.

@hadley
Copy link
Member

hadley commented Oct 10, 2022

If your tests take too long to run, I think you are better off refactoring them to make them faster. IMO doing anything automatically with caching is extremely likely to be a net negative, because you will get intermittent problems due to not getting the cache invalidation 100% correct.

@moodymudskipper
Copy link

moodymudskipper commented Oct 10, 2022

FWIW I use the workflow you recommend, and I was using it at the exact time I laid down this idea. devtools::test_active_file() and testing with cache are 2 workarounds that might result in false negatives (makes you feel it's green when it's not).
The latter will lead to less of those though, and it might come with a message or warning. I would not like to use the cache by default. The workflow I have in mind is test with cache before commit (or in between), test without cache for PR (or let the CI do those), in projects with long tests.
Refactoring tests might be time consuming, and if you have 200 tests spanning from 0.1 to 3 seconds, it's hard to find the right ones to tackle and get motivated to attempt refactoring.
A consequence of that might be that users might refrain from writing tests, or might not test their changes as often, this has a cost too.

Regarding implementation Jim shared on twitter that {covr} might do a fair chunk of the work already : https://covr.r-lib.org/reference/covr.record_tests.html

@hadley
Copy link
Member

hadley commented Oct 10, 2022

Sorry not to be clear, I really don't think this is a good idea and it's not something that I think is a good fit for testthat. I'd suggest implementing it in another package or explicitly caching your tests yourself.

@hadley hadley closed this as completed Oct 10, 2022
@wurli
Copy link
Author

wurli commented Oct 10, 2022

Absolutely no worries. Thanks for your comments/consideration 🙂

@krlmlr
Copy link
Member

krlmlr commented Mar 23, 2023

I agree with the general sentiment that each *.R source file should have a corresponding test-*.R test file, this makes development easier in the long run. I have a snippet that creates boilerplate test files that follow this structure: https://gist.github.com/krlmlr/262b7df52839de6f0dbad33a69824247 . Eventually, this can be upgraded to synchronize source files and tests as new functions are added.

r-lib/pillar@19cf68b (#640) from r-lib/pillar#640 shows this snippet in action. The tests are failing initially because most functions require arguments. Now I "only" need to go through all the new test files and find existing good tests or write new tests to correspond to this structure.

For projects where this isn't feasible right now, I have a small package that reruns only failing tests: https://github.com/krlmlr/lazytest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants