Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break cache if Gemfile.lock or .rubocop.yml change. #300

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joshuapinter
Copy link
Contributor

@joshuapinter joshuapinter commented Jan 8, 2023

Fixes #299.

Wrote in the addition of the Gemfile.lock and the .rubocop.yml files into the cache so that if either of those change, the cache will break.

I noticed you were already doing this with .erb-lint.yml so that's great, no need to make any changes there. I just followed a similar pattern.

I placed the reading of both of these files in the Cache#initialize method so it doesn't get run every time checksum is called - maybe a little more performant.

Besides that, I tested this out extensively in development by disabling the pruning and making various changes to see if the caches were hit or not. Everything seemed to work great.

I attempted writing specs but got a little lost with how to make it work correctly. I couldn't see any pattern laid down with how you handle when .erb-lint.yml config is changed so I left that for now. Open to comment and suggestion.

We're gonna use our fork in "production" to test it more thoroughly and ensure any config or Gemfile changes won't produce false negatives.

TODOs

  • Test locally.
  • Test in CI.
  • Write specs?

Copy link

@zachfeldman zachfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshuapinter I know you didn't ask me for a review but thought I'd leave a few comments - the changes look pretty good so far! I would add some unit tests to cli_spec.rb or cache_spec.rb. Right now we're not covering busting the cache very well but we could be and test the scenarios that should bust it or at least the new ones. Nice work.

@@ -76,7 +78,7 @@ def checksum(filename, file_content)
mode = File.stat(filename).mode

digester.update(
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}"
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{@rubocop_config}#{@gemfile_lock}#{file_content}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the amount of variables in the cache key now and no actual string content, should we break it out line by line to make it a little more clear vs the current string interpolation with something like:

irb(main):023:0> str_element_1 = "hello"
=> "hello"
irb(main):024:0> str_element_2 = "hi"
=> "hi"
irb(main):025:0> rubocop_config = "some_config"
=> "some_config"
irb(main):026:0] %w[
irb(main):027:0] str_element_1
irb(main):028:0] str_element_2
irb(main):029:0] rubocop_config
irb(main):030:0> ].join
=> "str_element_1str_element_2rubocop_config"

Just a style thing but might make this more readable especially if additional keys are added later.

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this should effectively cache the File.read for all calls of checksum.

@joshuapinter
Copy link
Contributor Author

joshuapinter commented Jan 8, 2023 via email

@mculp
Copy link

mculp commented Mar 1, 2023

Can I pick it up from here or are you going to get back around to it @joshuapinter? The new(ish?) cache feature brings my app’s linting w/ --lint-all --enable-all-linters from 2 minutes to 2 seconds, which is awesome, but it needs this invalidation for us to be able to use it.

@joshuapinter
Copy link
Contributor Author

@mculp Please do! The only thing that should be needed are proper specs. We've been using this in development and in CI for the last couple months and it's been working perfectly, always breaking cache when gems or configs have been updated. Makes using it very quick most of the time but also very reliable when things change.

@joshuapinter
Copy link
Contributor Author

joshuapinter commented Apr 18, 2023

Hmmm, just upgraded our fork from 0.3.1 to 0.4.0 and the caching doesn't seem to be working. Every time I run erb_lint it says:

Cache being created for the first time, skipping prune

I also tried using the main repo to see if it was related to this fork but seeing it there too.

If I revert to 0.3.1 on either our fork or the main gem, the cache is properly used.

Anybody else seeing this?

@zachfeldman
Copy link

@joshuapinter we've been running 0.4.0 for a while now and I haven't noticed the cache not working. I just tried locally and it appears to be working. I tried moving the .erb-lint-cache dir out and re-ran and it had to regenerate the first time, but then worked on subsequent runs. Sorry I don't have anything more helpful to debug your issue!

@joshuapinter
Copy link
Contributor Author

@zachfeldman Thanks for the quick response!

So strange... I removed the .erb-lint-cache directory to see if there was a file system issue or something but it didn't help. Still taking a long time and not hitting any cache after multiple runs:

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will also bust the cache whenever any project dependencies change, even if the rubocop version did not. So, while I think this will work to achieve the desired effect not reusing the cache between different rubocop versions, it will also make running the erblint slower than it could/should be by not utilizing the cache when it could.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I was aware and okay with that tradeoff, erring on the side of caution.

Alternatively, you would have to hash the versions for rubocop and all rubocop related plugins in Gemfile.lock. Doable but I didn't want to spend any more time on it.

I find this simple and effective, with the caveat that non-rubocop updates will break the cache but if you want to write up the alternative solution that would be great.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.

I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)

Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:

@joshuapinter joshuapinter marked this pull request as ready for review September 6, 2024 15:48
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.

I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)

Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what information is available here, but is it worth doing this only if the Rubocop linter is enabled? (If that's possible, it seems like a meaningful improvement.)

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
@rubocop_config = File.read(".rubocop.yml") if File.exist?(".rubocop.yml")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if there's a way to fetch this from linters.Rubocop.rubocop_config.inherit_from, that seems worthwhile, since it is parameterized in the first place: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L81

Also, I don't know how common it is, but it looks like this library does allow for URIs there also: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L86

Actually, given the complexity / variability there, if there's a way to access an instance of the Rubocop linter here, it may be worth removing dependence on the source of that configuration.

Copy link

@oehlschl oehlschl Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pardon the commentary. I think this is a good PR, and I don't want to let solving for all the cases get in the way of anything; just reading through the code and thinking aloud.

It looks like runner context is available in run_using_cache, so maybe some "runner checksum" could be passed an optional arg to cache.get() and cache.set()? That solves for:

  • through the runner, putting linters in charge of their own checksum logic
  • not being coupled to / duplicating awareness of how linters are configured

I'll give this some thought over the weekend and see if I come up with anything useful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put up something small for feedback here: #373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache ignores Rubocop version and rubocop.yml configuration.
4 participants