Break cache if `Gemfile.lock` or `.rubocop.yml` change. #300

joshuapinter · 2023-01-08T20:06:13Z

Fixes #299.

Wrote in the addition of the Gemfile.lock and the .rubocop.yml files into the cache so that if either of those change, the cache will break.

I noticed you were already doing this with .erb-lint.yml so that's great, no need to make any changes there. I just followed a similar pattern.

I placed the reading of both of these files in the Cache#initialize method so it doesn't get run every time checksum is called - maybe a little more performant.

Besides that, I tested this out extensively in development by disabling the pruning and making various changes to see if the caches were hit or not. Everything seemed to work great.

I attempted writing specs but got a little lost with how to make it work correctly. I couldn't see any pattern laid down with how you handle when .erb-lint.yml config is changed so I left that for now. Open to comment and suggestion.

We're gonna use our fork in "production" to test it more thoroughly and ensure any config or Gemfile changes won't produce false negatives.

TODOs

Test locally.
Test in CI.
Write specs?

zachfeldman

@joshuapinter I know you didn't ask me for a review but thought I'd leave a few comments - the changes look pretty good so far! I would add some unit tests to cli_spec.rb or cache_spec.rb. Right now we're not covering busting the cache very well but we could be and test the scenarios that should bust it or at least the new ones. Nice work.

zachfeldman · 2023-01-08T22:18:08Z

lib/erb_lint/cache.rb

@@ -76,7 +78,7 @@ def checksum(filename, file_content)
      mode = File.stat(filename).mode

      digester.update(
-        "#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}"
+        "#{mode}#{config.to_hash}#{ERBLint::VERSION}#{@rubocop_config}#{@gemfile_lock}#{file_content}"


With the amount of variables in the cache key now and no actual string content, should we break it out line by line to make it a little more clear vs the current string interpolation with something like:

irb(main):023:0> str_element_1 = "hello" => "hello" irb(main):024:0> str_element_2 = "hi" => "hi" irb(main):025:0> rubocop_config = "some_config" => "some_config" irb(main):026:0] %w[ irb(main):027:0] str_element_1 irb(main):028:0] str_element_2 irb(main):029:0] rubocop_config irb(main):030:0> ].join => "str_element_1str_element_2rubocop_config"

Just a style thing but might make this more readable especially if additional keys are added later.

zachfeldman · 2023-01-08T22:31:20Z

lib/erb_lint/cache.rb

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
      @cache_dir = cache_dir || CACHE_DIRECTORY
      @hits = []
      @new_results = []
+      @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")


Nice, I think this should effectively cache the File.read for all calls of checksum.

joshuapinter · 2023-01-08T22:39:41Z

I didn’t ask but I was secretly hoping you would. Lol. I’ll respond to your comments after the kids go to bed. Ha.

…

On Jan 8, 2023 at 3:34 PM -0700, Zach Feldman ***@***.***>, wrote: @zachfeldman commented on this pull request. @joshuapinter I know you didn't ask me for a review but thought I'd leave a few comments - the changes look pretty good so far! I would add some unit tests to cli_spec.rb or cache_spec.rb. Right now we're not covering busting the cache very well but we could be and test the scenarios that should bust it or at least the new ones. Nice work. In lib/erb_lint/cache.rb: > @@ -76,7 +78,7 @@ def checksum(filename, file_content) mode = File.stat(filename).mode digester.update( - "#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}" + ***@***.******@***.***_lock}#{file_content}" With the amount of variables in the cache key now and no actual string content, should we break it out line by line to make it a little more clear vs the current string interpolation with something like: irb(main):023:0> str_element_1 = "hello" => "hello" irb(main):024:0> str_element_2 = "hi" => "hi" irb(main):025:0> rubocop_config = "some_config" => "some_config" irb(main):026:0] %w[ irb(main):027:0] str_element_1 irb(main):028:0] str_element_2 irb(main):029:0] rubocop_config irb(main):030:0> ].join => "str_element_1str_element_2rubocop_config" Just a style thing but might make this more readable especially if additional keys are added later. In lib/erb_lint/cache.rb: > @@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) @cache_dir = cache_dir || CACHE_DIRECTORY @hits = [] @new_results = [] + @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") Nice, I think this should effectively cache the File.read for all calls of checksum. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

mculp · 2023-03-01T06:40:43Z

Can I pick it up from here or are you going to get back around to it @joshuapinter? The new(ish?) cache feature brings my app’s linting w/ --lint-all --enable-all-linters from 2 minutes to 2 seconds, which is awesome, but it needs this invalidation for us to be able to use it.

joshuapinter · 2023-03-02T21:43:55Z

@mculp Please do! The only thing that should be needed are proper specs. We've been using this in development and in CI for the last couple months and it's been working perfectly, always breaking cache when gems or configs have been updated. Makes using it very quick most of the time but also very reliable when things change.

joshuapinter · 2023-04-18T15:58:03Z

Hmmm, just upgraded our fork from 0.3.1 to 0.4.0 and the caching doesn't seem to be working. Every time I run erb_lint it says:

Cache being created for the first time, skipping prune

I also tried using the main repo to see if it was related to this fork but seeing it there too.

If I revert to 0.3.1 on either our fork or the main gem, the cache is properly used.

Anybody else seeing this?

zachfeldman · 2023-04-18T16:24:06Z

@joshuapinter we've been running 0.4.0 for a while now and I haven't noticed the cache not working. I just tried locally and it appears to be working. I tried moving the .erb-lint-cache dir out and re-ran and it had to regenerate the first time, but then worked on subsequent runs. Sorry I don't have anything more helpful to debug your issue!

joshuapinter · 2023-04-18T17:21:42Z

@zachfeldman Thanks for the quick response!

So strange... I removed the .erb-lint-cache directory to see if there was a file system issue or something but it didn't help. Still taking a long time and not hitting any cache after multiple runs:

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

$ erblint

====
Running ERB Lint...

Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files

oehlschl · 2024-09-06T01:00:00Z

lib/erb_lint/cache.rb

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
      @cache_dir = cache_dir || CACHE_DIRECTORY
      @hits = []
      @new_results = []
+      @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")


It looks like this will also bust the cache whenever any project dependencies change, even if the rubocop version did not. So, while I think this will work to achieve the desired effect not reusing the cache between different rubocop versions, it will also make running the erblint slower than it could/should be by not utilizing the cache when it could.

Correct. I was aware and okay with that tradeoff, erring on the side of caution.

Alternatively, you would have to hash the versions for rubocop and all rubocop related plugins in Gemfile.lock. Doable but I didn't want to spend any more time on it.

I find this simple and effective, with the caveat that non-rubocop updates will break the cache but if you want to write up the alternative solution that would be great.

Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.

I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)

Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:

RuboCop::Server::Cache.restart_key: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/server/cache.rb#L51

RuboCop::ResultCache#rubocop_checksum: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/result_cache.rb#L174

oehlschl · 2024-09-07T01:52:39Z

lib/erb_lint/cache.rb

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
      @cache_dir = cache_dir || CACHE_DIRECTORY
      @hits = []
      @new_results = []
+      @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")


Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.

I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)

Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:

RuboCop::Server::Cache.restart_key: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/server/cache.rb#L51

RuboCop::ResultCache#rubocop_checksum: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/result_cache.rb#L174

oehlschl · 2024-09-07T01:59:28Z

lib/erb_lint/cache.rb

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
      @cache_dir = cache_dir || CACHE_DIRECTORY
      @hits = []
      @new_results = []
+      @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")


I don't know what information is available here, but is it worth doing this only if the Rubocop linter is enabled? (If that's possible, it seems like a meaningful improvement.)

oehlschl · 2024-09-07T02:05:14Z

lib/erb_lint/cache.rb

@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
      @cache_dir = cache_dir || CACHE_DIRECTORY
      @hits = []
      @new_results = []
+      @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
+      @rubocop_config = File.read(".rubocop.yml") if File.exist?(".rubocop.yml")


Additionally, if there's a way to fetch this from linters.Rubocop.rubocop_config.inherit_from, that seems worthwhile, since it is parameterized in the first place: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L81

Also, I don't know how common it is, but it looks like this library does allow for URIs there also: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L86

Actually, given the complexity / variability there, if there's a way to access an instance of the Rubocop linter here, it may be worth removing dependence on the source of that configuration.

Also, pardon the commentary. I think this is a good PR, and I don't want to let solving for all the cases get in the way of anything; just reading through the code and thinking aloud.

It looks like runner context is available in run_using_cache, so maybe some "runner checksum" could be passed an optional arg to cache.get() and cache.set()? That solves for:

through the runner, putting linters in charge of their own checksum logic

not being coupled to / duplicating awareness of how linters are configured

I'll give this some thought over the weekend and see if I come up with anything useful.

Put up something small for feedback here: #373

zachfeldman reviewed Jan 8, 2023

View reviewed changes

github-actions bot added the cla-needed label Apr 18, 2023

joshuapinter force-pushed the main branch 2 times, most recently from ae61b11 to 2ca4dee Compare April 18, 2023 15:53

joshuapinter force-pushed the main branch from 2ca4dee to 9403ca0 Compare August 25, 2023 00:28

github-actions bot removed the cla-needed label Aug 25, 2023

joshuapinter force-pushed the main branch from 9403ca0 to 838f126 Compare August 25, 2023 12:21

joshuapinter force-pushed the main branch from 838f126 to 91010d8 Compare February 5, 2024 17:43

joshuapinter force-pushed the main branch from 91010d8 to e1b0cee Compare May 17, 2024 20:19

oehlschl reviewed Sep 6, 2024

View reviewed changes

joshuapinter marked this pull request as ready for review September 6, 2024 15:48

oehlschl reviewed Sep 7, 2024

View reviewed changes

oehlschl mentioned this pull request Sep 8, 2024

Bust erblilnt cache on rubocop config changes via linter checksum #373

Open

joshuapinter force-pushed the main branch from e1b0cee to 8e0da52 Compare September 23, 2024 22:22

joshuapinter mentioned this pull request Sep 24, 2024

Rename CACHE_DIRECTORY to .erb_lint_cache. #380

Merged

Break cache if Gemfile.lock or .rubocop.yml change.

bcca3cd

joshuapinter force-pushed the main branch from 7dd6dbb to bcca3cd Compare September 24, 2024 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break cache if `Gemfile.lock` or `.rubocop.yml` change. #300

Break cache if `Gemfile.lock` or `.rubocop.yml` change. #300

joshuapinter commented Jan 8, 2023 •

edited

Loading

zachfeldman left a comment

zachfeldman Jan 8, 2023

zachfeldman Jan 8, 2023

joshuapinter commented Jan 8, 2023 via email

mculp commented Mar 1, 2023

joshuapinter commented Mar 2, 2023

joshuapinter commented Apr 18, 2023 •

edited

Loading

zachfeldman commented Apr 18, 2023

joshuapinter commented Apr 18, 2023

oehlschl Sep 6, 2024

joshuapinter Sep 6, 2024

oehlschl Sep 7, 2024

oehlschl Sep 7, 2024

oehlschl Sep 7, 2024

oehlschl Sep 7, 2024

oehlschl Sep 7, 2024 •

edited

Loading

oehlschl Sep 8, 2024

Break cache if Gemfile.lock or .rubocop.yml change. #300

Are you sure you want to change the base?

Break cache if Gemfile.lock or .rubocop.yml change. #300

Conversation

joshuapinter commented Jan 8, 2023 • edited Loading

TODOs

zachfeldman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshuapinter commented Jan 8, 2023 via email

mculp commented Mar 1, 2023

joshuapinter commented Mar 2, 2023

joshuapinter commented Apr 18, 2023 • edited Loading

zachfeldman commented Apr 18, 2023

joshuapinter commented Apr 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oehlschl Sep 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Break cache if `Gemfile.lock` or `.rubocop.yml` change. #300

Break cache if `Gemfile.lock` or `.rubocop.yml` change. #300

joshuapinter commented Jan 8, 2023 •

edited

Loading

joshuapinter commented Apr 18, 2023 •

edited

Loading

oehlschl Sep 7, 2024 •

edited

Loading