-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild cache if the underlying data changed #276
Comments
I like if the cache can tell if the file has changed. This should make workflow easier. The only edge case I can see are researchers working with unstable data and using the cache to capture a particular state they are working with now. |
This could actually be improved by this change, because if you cache the files once and then set |
In order to tell the if a file is changed, can we just compare the modified data of the file with the creation date of the cache file? I believe I seen another "Reproducible Research" project which used makefile in this way to only process specific files. |
Rather than implementing this into the cache function wouldn't it be better to implement directly into the loading function to automate this process? Perhaps a yes/no question could be asked to allow the user to not load the new file... |
Comparing created and modified timestamps is risky. Sometimes modified timestamps are updated by the operating system even though nothing has changed in the filed. Asking a user each time a cache file is being updated is also error prone. With many files, the question becomes a nuisance and the user mindless hits "y". Currently, you can pass a list of variable names to |
Excellent points, thanks for the clarity. From an automation standpoint, one would simply call clear.cache() prior to load.project() for a full reload? Perhaps someday another function could be added or parameter could be passed into load.project which compares files. It’s not critical but would allow a person to possibly automate E2E and produce results as quickly as possible without needing to reload very large unchanged datasets. |
Yes call |
Yes it would really only benefit those who are pulling in files. I'll also be trying to connect to DB's where possible but of course will have to rely on some files. At the end of the day, a few extra minutes to load data isn't going to matter unless I'm sitting there watching it load and getting impatient! :) |
`load.project` also has a `reset` argument which clears the cache when set
to `TRUE`.
I agree with @KentonWhite that simply using modification date is tricky.
Also, I think, using cache in the reader could make the cache and data
loading simpler in `load.project`.
Op do 27 sep. 2018 01:52 schreef bugsysiegals <[email protected]>:
… Yes it would really only benefit those who are pulling in files. I'll also
be trying to connect to DB's where possible but of course will have to rely
on some files. At the end of the day, a few extra minutes to load data
isn't going to matter unless I'm sitting there watching it load and getting
impatient! :)
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#276 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGn639Z177NCoMVzCHf4IWfxyYDqxt7dks5ufBM8gaJpZM4WVUH4>
.
|
Report an Issue / Request a Feature
I'm submitting a (Check one with "x") :
Issue Severity Classification -
(Check one with "x") :
Expected Behavior
When a file in
data/
is changed but the resulting variable exists in the cache the file is not reloaded.Current Behavior
Currently caching of the data is only done after the variable is loaded into memory, and cached variables are not reloaded if the original file was changed.
Version Information
Possible Solution
Update the
cache
function to also include afile
argument, similar to thedepends
argument. If the digest of the file has changed reload the file and rebuild the cache. This could be done inside the reader as follows (using the 1.0 reader signature):This way assigning the variable in global namespace is left to
cache
, theCODE
argument is evaluated as it is normally inside thecache
function, and is only updated if the dependency in thefile
argument changed.How do you guys feel about this?
The text was updated successfully, but these errors were encountered: