Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find load_paths dynamically by asking Rails and eliminate cache in Packwerk::Configuration #150

Merged

Conversation

alexevanczuk
Copy link
Contributor

@alexevanczuk alexevanczuk commented Oct 18, 2021

Summary

Caching load paths was done to optimize start-up time so that packwerk check on a single file is fast. After some discussion (see slack thread at bottom), we think it may be a better tradeoff to no longer cache load paths and instead load the rails application. The reason for this was that it was a pain to add to load_paths. In the happy path, a user remembers that they need to do this, and they run bin/packwerk validate prior to updating deprecations and they use the error message to copy and paste into packwerk.yml.

In the unhappy path, a user:

  1. Makes their code change and runs update-deprecations. No new violations are found since load_paths wasn't updated.
  2. The user pushes to CI which fails with bin/packwerk validate
  3. The user runs that, updates packwerk.yml
  4. The user needs to run bin/packwerk update-deprecations again.

Spring

We are still investigating what it would mean for spring to continue to help optimize boot time given the change. One option here would be to tell spring to watch certain files and folders. We can have packwerk inject to that list and write a digest to tmp, so when packwerk paths are updated, it instructs spring to reload. If necessary, we can delve into this deeper prior to landing.

We reenabled spring here: #153

What are you trying to accomplish?

Remove need to cache load_paths in packwerk.yml.

What approach did you choose and why?

I chose to use the current implementation of generating the load_paths when the configuration file is generated so that the behavior is identical. I kept the API to configuration the same for clients, and explicitly raised an error in places that today expect errors to be raised when a non-rails application attempts to use packwerk. I found these places by the failures in in the test suite.

What should reviewers focus on?

Reviewers may want to focus on:

  • If there are other performance issues that I am not anticipating as a result of dynamically computing load paths
  • How this may affect behavior of the application, which today caches via packwerk.yml, and now may cache from spring
  • If the documentation change is adequate (should we give a hint on how users may avoid challenges if they use spring?)

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Additional Release Notes

load_paths in packwerk.yml is now deprecated. The user will receive a deprecation warning if they have this configuration to help prompt them to remove it. A subsequent version can raise instead.

Checklist

  • I have updated the documentation accordingly.
  • [/] I have added tests to cover my changes. => This was a refactor. My take on this was that a refactor should not change behavior and should not require tests to change, unless those tests are coupled to structure. That being said, I needed to remove some tests that were coupled to behavior/constraints that were no longer required, and I needed to update some tests that were coupled to structure.
  • It is safe to rollback this change

Slack thread

Alex Evanczuk Oct 15th at 3:17 PM
:wave::skin-tone-3: Hi all!
One thing that has tripped up our eng ICs is the need to add new load_paths to packwerk.yml
We’ve created some processes and tooling to make moving files into packs and between packs easy and seamless, but this is a bit of friction I’d love to remove if possible. Originally I was planning on creating tooling that adds to packwerk.yml automatically, but this felt like something that belonged more in packwerk itself. My question for you is what is load_paths needed for, and would it be possible to remove the need for load_paths entirely? Based on reading the source a bit, packwerk already knows what the load_paths should be (that’s how bin/packwerk validate is able to let us know certain load_paths are missing) — so could we just compute it on the fly? Is there a caching/performance consideration here? Would love to learn more :pray::skin-tone-3:




27 replies

Philip Müller (ET)  3 days ago
Packwerk is optimized for fast startup time. To get the load paths, we’d need to boot the application, which we didn’t want.

Philip Müller (ET)  3 days ago
We need the load paths for constant resolution; you can take a look at the constant_resolver gem for details. (edited) 

Alex Evanczuk  3 days ago
Do you want fast startup time because you expect users to be running certain commands very often? If so what commands are those?
The reason I ask is because I find for our users bin/packwerk update-deprecations time tends to be the main bottleneck (over application load time), but as we’re encouraging people to break up their packs, they end up wasting more time due to CI failing on bin/packwerk validate, then having to load the application anyways before adding to packwerk.yml
I’m wondering if we can have our cake and eat it too here somehow…

Philip Müller (ET)  3 days ago
If we want to eliminate the dependency on load paths completely, we’d need to do something along the lines of https://github.com/Shopify/constant_resolver/issues/26

Philip Müller (ET)  3 days ago
This is because rubocop integration and execution of check on a single file was the most important feature when we wrote this

Philip Müller (ET)  3 days ago
At this point it would probably be a better tradeoff to boot the application

Alex Evanczuk  3 days ago
Gotcha, that was going to be my next question — if those two features are still as important to your org and packwerk community.

Philip Müller (ET)  3 days ago
It’s a good point, and maybe that should be an issue on the repo for visibility

Philip Müller (ET)  3 days ago
maintaining the lists of load paths and inflections is certainly a hassle for us too

Alex Evanczuk  3 days ago
If not, and we feel like we could stop requiring load paths and make a different tradeoff, I’d be interested in opening that issue, getting feedback, and depending on feedback, implementation requirements, and my capacity, I may be able to implement this change. (edited) 

Philip Müller (ET)  3 days ago
Can we start with a simple PR that boots the application on check without doing any other modifications so we can judge the perf impact?

Philip Müller (ET)  3 days ago
My gut feeling is that it’s an obvious decision but knowing is better

Alex Evanczuk  3 days ago
Sure, I can do that. Is the idea to benchmark bin/packwerk check xyz where XYZ is a file, pack, or whole application before and after the change? Is there a threshold where a certain performance regression would lead us to a go/no go decision?

Philip Müller (ET)  3 days ago
OK hold on I’m discussing with Rafael
:ack:
1


Philip Müller (ET)  3 days ago
So, booting our app is 10-20 seconds. That’s a pretty hefty penalty on checking not just a single file, but even a package.

Philip Müller (ET)  3 days ago
hold on that’s without spring

Alex Evanczuk  3 days ago
Yes — I understand why you wouldn’t want to do that, especially if you’re encouraging people to run packwerk check on single files frequently.
I’d love to eliminate some of the root issues requiring the load_paths, namely the dependence on zeitwerk, but it sounds like a pretty hefty refactor investment. For the time being, do you think it would add too much complexity to allow the user to configure packwerk to tell it to use the “cache” or not? That is — Packwerk.configure { |config| config.use_load_path_cache = true } (true by default). If this config is true, then validate checks load paths, and other systems read from the “cache” (i.e. packwerk.yml ). If it’s false , then validate does not not validate that cache, and other commands use ApplicationLoadPaths to load the application and pull the load_paths dynamically. What do you think?

Philip Müller (ET)  3 days ago
My favourite solution here would be if we could completely remove the special handling for load paths and inflections and just boot the app and ask it.
Adding a switch adds complexity. Eliminating the old way reduces it.
But we need to make sure it makes sense first
:green_heart:
1
:heavy_plus_sign:
1


Philip Müller (ET)  3 days ago
OK - we’d need to use spring to speed up boot times, and make sure that executing with and without spring (if possible at all) doesn’t deliver different results

Philip Müller (ET)  3 days ago
but in that case, I believe it would make sense (and Rafael just agreed in DM) (edited) 

Alex Evanczuk  3 days ago
That’s great — I can look into making this behavioral change. Let me know if there’s any investigation you’d like me to do regarding spring. I don’t exactly recall the mechanisms of spring, but I believe it caches environment and constants loaded at initialization time, but it doesn’t cache other types of constants. With that in mind I think it could deliver different results, but I’m not positive. As long as spring is disabled in CI though it should be caught there.

Philip Müller (ET)  3 days ago
spring could be disabled in CI, but it would be very confusing to have different behavior in CI vs locally.
:heavy_plus_sign:
1


Philip Müller (ET)  3 days ago
Packwerk doesn’t use the application context at all except to extract load paths and inflections, so caching constants that arent’s set at initialization time shouldn’t matter. (edited) 

Alex Evanczuk  3 days ago
I’m not sure if spring caches rails load paths and inflections :thinking_face:

Philip Müller (ET)  3 days ago
I’m pretty sure it caches most of what is needed to compute those values, at the very least - I think it actually caches all of it - but that’s an easy experiment to run

Philip Müller (ET)  3 days ago
Time for weekend! Let’s continue chatting next week
:wave_animated:
1


Alex Evanczuk  3 days ago
Sounds good! Have a good weekend :slightly_smiling_face:
:same:
1

@alexevanczuk alexevanczuk requested a review from a team as a code owner October 18, 2021 16:31
@ghost ghost added the cla-needed label Oct 18, 2021
"Extraneous load paths in file:\n#{format_yaml_strings(actual - expected)}"
)
end
def validate_rails_app!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to have the API to this here even though the implementation is in configuration (and the error is raised in the application load paths class), since it is a validator after all. Happy to incorporate any feedback here.

Copy link
Contributor

@exterm exterm Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're talking about three responsibilities here:

  1. making sure we have a Rails app
  2. booting the Rails app
  3. extracting load paths

ApplicationLoadPaths currently does all of that. It might make sense to move 1 and 2 into a separate class HostApplication, to be reused here? Or have a differently named class that does it all - but I think the load path extraction logic is actually interesting enough to deserve its own class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not need to boot the application for validate anymore once we have removed the two application state caches, so it might actually make sense to not extract the load paths here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I think this comment still applies.

Do we actually need to validate it's a rails app here? If we don't packwerk check might fail, but it would fail with a pretty obvious error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I suppose we don't -- I removed it on the second commit

@alexevanczuk alexevanczuk force-pushed the ae-boot-application-with-packwerk-check branch from 4567bca to 088aff0 Compare October 18, 2021 17:12
@inflections_file = File.expand_path(configs["inflections_file"] || "config/inflections.yml", @root_path)
@parallel = configs.key?("parallel") ? configs["parallel"] : true

@config_path = config_path
end

def load_paths
@load_paths ||= ApplicationLoadPaths.extract_relevant_paths(@root_path, 'test')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard coded environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because the CLI by default hard-codes the environment with this value:
https://github.com/Shopify/packwerk/blob/main/lib/packwerk/cli.rb#L22

As far as I know it's not a feature to allow this val to be changed, so I kept it the same. Let me know what you think the best path forward is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, yeah it's a good question which class should know about this. Intuitively, if we hard code this, it makes more sense for ApplicationLoadPaths to know the value then for Configuration, IMO.

@ghost ghost removed the cla-needed label Oct 18, 2021
@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

OK, nitpicks and details aside - as I said on slack, I think this generally makes sense.

However, I think we're not on the same page regarding spring, and we should also consider inflections.

Let's start with the latter. Packwerk is currently effectively caching load paths and inflections to avoid booting the application for check. If we want to make booting the application a prerequisite for check, we should also eliminate inflections.yml and all the inflector stuff and instead just use the application's inflector.

On the topic of spring, we brought this up because now, with you proposed change, the application needs to be booted in order to run check. My first concern with this changeset is that I can't see the application being booted anywhere. IIRC we previously did that through a binstub (bin/packwerk). Indeed if we go ahead with this, packwerk check fail with a nice error message if the application isn't booted.

When you're writing the code to boot the application, you should be able to do so using spring. The speed up we are looking for should be implicitly provided because spring won't start the application twice.

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

I hope I remember correctly how spring works, will try to find the PR where Rafael removed it (and the binstubs)

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

this is the PR I'm referring to #128, still reading it

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

OK I need to dive a bit deeper into this, I lost track of where the application is actually booted in the latest release

@alexevanczuk
Copy link
Contributor Author

Let's start with the latter. Packwerk is currently effectively caching load paths and inflections to avoid booting the application for check. If we want to make booting the application a prerequisite for check, we should also eliminate inflections.yml and all the inflector stuff and instead just use the application's inflector.

Happy to do that. Do you think that should be in this same PR or is having it be in a follow-up one okay? My understanding is that the inflections.yml is sort of like another cache of Rails knowledge. From a feature perspective I can be persuaded either way (do this in the same change set of a follow-up).

On the topic of spring, we brought this up because now, with you proposed change, the application needs to be booted in order to run check. My first concern with this changeset is that I can't see the application being booted anywhere. IIRC we previously did that through a binstub (bin/packwerk). Indeed if we go ahead with this, packwerk check fail with a nice error message if the application isn't booted.

When you're writing the code to boot the application, you should be able to do so using spring. The speed up we are looking for should be implicitly provided because spring won't start the application twice.

I may be misunderstanding here! My understanding is that ApplicationLoadPaths requires the whole application here:
https://github.com/Shopify/packwerk/blob/main/lib/packwerk/application_load_paths.rb#L61

Simply by accessing Configuration#load_paths, we hit that line and load the application. Let me know if you mean something else by booting the application.

Regarding spring -- I'm not totally clear how spring works here, but I thought that spring basically wraps other commands and intercepts invocations to load the rails application and loads its own cache. Do I need to explicitly load something with Spring instead? If so, is there a way to make that work for users who don't use spring (perhaps the user can inject a dependency here for faster loading?)?

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

Thank you for helping me along, yes, ApplicationLoadPaths does indeed boot the application through its require 🤦🏼

This part of the PR I mentioned shows how we were previously using spring to boot the application (need to expand lib/packwerk/generators/templates/packwerk).

You can see there's a fallback for if spring isn't available.

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

That PR also has a little gem of info on the environment used to extract the autoload paths:

# Packwerk needs to run in a test environment, which has a set of autoload paths that are
# often a superset of the dev/prod paths (for example, test/support/helpers)

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

Inflections could go into a separate PR, and that would make things easier to digest. However, I do think of both modifications as one change to the product. The idea should be to eliminate all forms of caches for application configuration.

@alexevanczuk
Copy link
Contributor Author

Thanks for all the info. So if I'm understanding this, we more or less need to revert that commit here (which I can do by hand) so users can use spring to speed up packwerk. It seems like that too can go in a separate PR that comes before this.

I am not super familiar with how Spring works -- is it that we are calling Spring.register_command with a way for Spring to know how to generate binstubs for calling packwerk? Does the user need to manually require spring_command.rb or does spring know that that file is there by convention? I didn't see anything in the readme about generating binstubs using spring, because it looks like bundle exec packwerk init will generate those from the template which attempts to load spring if it exists.

Given this, it seems like the path forward is:

  1. Put up a PR that selectively reverts Remove the need for a custom binstub or test #128 and adds back the generation of the binstub using spring
  2. Rebase this PR off of that
  3. Put up another PR get inflections from Rails application rather than from the inflections YML
  4. Finally, bump the version

Does this seem right to you?

@exterm
Copy link
Contributor

exterm commented Oct 18, 2021

That plan seems good to me, but I'm not a spring expert myself - it'd be good to have @rafaelfranca 's opinion. I'm pretty sure he has opinions on exactly how spring should be used here.

@alexevanczuk alexevanczuk mentioned this pull request Oct 18, 2021
6 tasks
@alexevanczuk
Copy link
Contributor Author

That plan seems good to me, but I'm not a spring expert myself - it'd be good to have @rafaelfranca 's opinion. I'm pretty sure he has opinions on exactly how spring should be used here.

Sounds good. @rafaelfranca I'd love some guidance from you here. I'm a little confused about the mental model for how spring comes into play here. Reading more at https://github.com/rails/spring, it seems like the user just needs to generate the binstub for packwerk with bundle binstubs packwerk and then springify the bin with bundle exec spring binstub --all.

Can you help me understand what (if anything) packwerk needs to do to support using spring with packwerk?

@exterm
Copy link
Contributor

exterm commented Oct 21, 2021

So, I think this is the relevant part of the previous implementation, when packwerk did use spring:

begin
  load(File.expand_path("spring", __dir__))
rescue LoadError => e
  raise unless e.message.include?("spring")
end

Afterwards, requires should go through spring's cache.

The way it was set up in packwerk is that this was part of a bin/packwerk binstub generated by packwerk init.

I think that makes more sense than requiring the user to do additional work, as they very likely would execute packwerk init anyway when integrating packwerk into the project.

I'm also pretty sure binstub --all only springifies binstubs for commands that spring knows; there is functionality to add commands to spring but I don't think we need to do that.

@alexevanczuk alexevanczuk force-pushed the ae-boot-application-with-packwerk-check branch 3 times, most recently from 1192aac to 5a9cb45 Compare November 22, 2021 13:26
lib/packwerk/cli.rb Outdated Show resolved Hide resolved
@alexevanczuk alexevanczuk force-pushed the ae-boot-application-with-packwerk-check branch from 5a9cb45 to 6da1bba Compare November 23, 2021 12:11
@alexevanczuk alexevanczuk force-pushed the ae-boot-application-with-packwerk-check branch from 6da1bba to ca496dc Compare November 23, 2021 12:14
Copy link
Contributor

@exterm exterm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff!

Just clean up the commit history and this is ready to go.

Love all that removed complexity.

@alexevanczuk alexevanczuk force-pushed the ae-boot-application-with-packwerk-check branch from ca496dc to 702d2dc Compare November 23, 2021 17:20
@alexevanczuk
Copy link
Contributor Author

Exciting stuff!

Just clean up the commit history and this is ready to go.

Love all that removed complexity.

@exterm Squashed it into a single commit!

@exterm
Copy link
Contributor

exterm commented Nov 23, 2021

Wanted to do a last test run against the monolith, but am getting a strange sorbet error that doesn't look related at all. Will have to take some time to look into this.

@alexevanczuk
Copy link
Contributor Author

Wanted to do a last test run against the monolith, but am getting a strange sorbet error that doesn't look related at all. Will have to take some time to look into this.

Interesting -- if you're able to post the output I may be able to help!

@exterm
Copy link
Contributor

exterm commented Nov 25, 2021

Turned out that somehow an incompatible version of sorbet-runtime snuck into my testing branch. After resetting to the version we're using in production everything looks good.

I cut a release 1.4.0 yesterday so that we can get some of the changes out before we release this stuff with 2.0.

@exterm exterm merged commit 11f6c60 into Shopify:main Nov 25, 2021
@alexevanczuk alexevanczuk deleted the ae-boot-application-with-packwerk-check branch November 29, 2021 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants