-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Max RSS is Significantly Higher in Version 2.0 #164
Comments
I'm guessing this make sense and it sort of expected. packwerk/lib/packwerk/parse_run.rb Line 74 in 4da4215
Off the top of my head, I can see 2 ways of resolving this issue: (1) switch to threads, or (2) load the rails environment only to extract the information we need. I'll outline the choices below: Switch to threadsSwitching to threads is a little tricky because we'll have to deal with thread issues. It's not too bad since we're not actually serving requests or anything complicated (other than loading the app), the only threading issue we'll need to deal with is auto loading. We're at the mercy of the application on what it does in the Load rails and extract what we needWe could boot the rails app and dump (or however we want to communicate between the rails app and the check process) what we need. Eg load paths, inflections, etc. This allows us to stay with forks, still utilize spring to boot the application, keep the packwerk scanner process memory low. |
Thank you for reporting this, Jordan, and for your analysis, Ngan. When @alexevanczuk and I discussed this change, we didn't measure memory usage, only run time. As an added data point, I just checked memory usage on our monolith and it doesn't seem to have increased substantially with packwerk 2.0.0. However, we were already using a springified binstub with packwerk 1.4, and if I disable spring on 1.4 I get significantly lower memory usage, comparable to the difference that Jordan is reporting. It's an interesting problem, since we should in theory only need to load a small part of the application to get the load paths, but in practice, we can't really know which part that is. @ngan , why do you think we need to eager load the app? I would assume that loading the bundle and executing the initializers should be enough to initialize load paths & inflections. |
@jordanstephens two workarounds come to mind for a short-term solution:
I haven't made up my mind on what the longer term course of action should be. Personally, I think the reduced complexity from removing the application caches may be worth the added memory usage - we're trading human effort for machine effort. |
Hmm this is an interesting problem. Great analysis @jordanstephens and @ngan! I like the automated determination of load paths for Packwerk, but I don't like that we need to load the entire Rails app just to get them. It's like having a dependency on an abstract library, very unpredictable. I don't know much about Rails internals but I wonder if there is anything we can do to pull the load path logic into a (Ruby) file with minimal dependencies that Packwerk could run. This is obviously a longer term solution but feels like it could be a good architectural change for Rail as well. The same might work for inflections as well. In the near term, I wonder if it is worth introducing an interface inside Packwerk to provide the load paths and inflections, and have the default implementation just read from the Rails app as we do now. An alternative implementation might read from the config file as we did before. A future implementation might take advantage of future interfaces in Rails that allow us to get this information without load the entire app. Sorry for the "Why not both?" proposal 😆 |
Hm, that's a good question... when I ran it with threads without eager loading, I got some errors. One of them, iirc, was "SyntaxError: dynamic constant assignment error" in one of the files. There were many other errors as well. But looking through packwerk, I can't see anything that actually references the app's constants causing it to autoload. @mclark autoload paths are derived during framework boot. A lot of other things can mutate the paths on boot as well, including gems, and the app itself. It may not seem like it, but it's actually very predictable for a Rails app to have its paths ready to go. It's actually a public API. I really like booting up the app to find these things out as it allow apps to do more custom things to autoload paths. And as long as Packwerk can discern from the public api, we should be good. I want to solve this problem as well because, right now, it's forking 10x with 10x the memory usage: |
Hey all -- I'm pairing with @ngan tomorrow to chat through this and get his feedback, but I'd love your feedback as well. I expect that if you pull this version and use it your CI environment it should resolve the memory issue. I also expect that Here are the results before and after this change on our codebase. While it does resolve the memory issue, it does slow down Memory Usage ResultsBefore
After
Speed ResultsBefore
After
|
Hey team -- We had a couple of questions to help build context and make sure we come to a good outcome. What's the urgency of this problem for you all? If we absolutely want to resolve this memory bloat issue (now), we could merge #166, which we believe is a bit less of an ideal solution. We thought may be a bit more ideal instead to reorganize things so the main process utilizes spring and calls out to another plain ruby process that requires the necessary gems (and only the necessary gems) and then parses files. This would be more performant because we would pay fewer total loading costs (see below for more details of this analysis). Here are the options as I see them:
Performance comparison of approachesTodayToday the main process is springified, which means that loading the entire bundle and rails is done via spring, and thus very fast. The only penalty we pay is packwerk parsing logic (which every approach pays). Loading rails in separate processIn this, the main process is NOT springified, so it pays the cost of manually loading the entire bundle (which is the main source of the performance regression). It then shells out to To my knowledge there is not a way to only load the bundle with spring and not the main application. Loading rails in the main process, packwerk in a separate processIn this more ideal solution (from a performance standpoint), we do everything we do today, except we spawn a new process (not forked) and manually require exactly what we need. In this case, we only overpay the cost of loading packwerk/ast/etc. again, but do not pay the cost of loading up the rest of the bundle again. We believe this approach should more or less get us to the same performance characteristics today. |
At Shopify, I have not heard any complaints about the increased memory usage. In terms of solutions, I've just started looking into this, but the increased start up time from #166 has me concerned. Having a separate process that doesn't load the whole bundle seems interesting. I'm also wondering if we can get a speedup by not loading anything in the |
I haven't had a chance to look at the proposals yet, but I can say that we were able to work around the issue by reducing parallelism for now, though I would still like to see this improved. I think booting rails in a side process and sending the load paths back to the main process before the fork sounds like a good implementation to me, but I need to look into it more closely. From a high level, though, I would support pursuing a more optimal solution over thrashing through a short-term fix. |
Thanks for the comments @exterm and @jordanstephens. For now, based on the feedback I've received which all makes sense, I think I'll close the proposal PR in favor of a better longer term solution. Once everyone is back from the holidays, I'd love to (A) align on what tradeoffs feel like the right ones (complexity/maintainability, memory, performance, etc and (B) align on an implementation for delivering on those tradeoffs. |
Hey all -- hope your holiday season was relaxing and safe. Can you please share with me some of your thoughts on the tradeoffs that are important to you? My understanding right now is that we're trading off machine time/resources for human time, and that we may be okay with that tradeoff. Let me know how you feel. If we wanted to not have this tradeoff, a medium term solution might involve packwerk commands loading rails and getting what we need, then running the rest of packwerk in an independent process that is given the rails dependencies. I see that as very similar to the previous proposal, except reversed (instead of rails dependencies being a separate process, packwerk is). Both of these solutions add a lot of complexity to the implementation, so our preference may be to wait until a longer-term solution. My ideal for a longer-term solution would be to see if we can use a non-zeitwerk based parser for Packwerk. @exterm did some investigation on that here: Shopify/constant_resolver#26. My ideal for this would be sorbet of course, but there may be some separate challenges to make that happen. Let me know what you're all thinking about this! |
Let's not confuse two things here:
Sorbet implements both of those things so there is a theoretical potential to use it for both, but they are separate problems and solving even just one of them could solve the memory usage problem. |
Thanks for making that distinction more clear @exterm I think that makes sense. To reiterate what we chatted about over video call, I think that moving to a faster parser should give us a lot of wiggle room to move to a thread-based approach to address this memory issue more simply. |
Rails is very good with copy-and-write, that is how puma and unicorn work in production without each process having the entire Rails application in resident memory, so if we boot the application in the master process, before we fork the other processes, the resident memory of each one of those process would be only for the new allocated or modified memory. |
Since it's been three months: Is there a status update regarding this issue that you guys can share? Thanks for all your work on this! |
Hi @fhwang I believe we've discovered (per Rafael's comment) that this is not an issue because Rails is good at being able to reuse memory space for forked processes. I'm going to close this for now! Please leave a comment or reopen if you or anyone sees more issues. |
Description
Memory usage in packwerk v2.0 is significantly higher than it was in v1.4—so much higher that we will need to increase the resource class of our CI nodes in order for us to run packwerk v2.0. It looks like the max RSS for the whole process tree has increased from ~850 MB to about ~2.5GB for our use-case.
I ran gnu
time -v
onbin/packwerk check
in our application working directory using both versions of packwerk and saw the following results:To Reproduce
Run packwerk while watching memory usage, compare between v1.4 and v2.0
Expected Behaviour
I expected memory usage to not increase so dramatically between v1.4 and v2.0.
Screenshots
The GNU time output only captures the max RSS of the root process, so I grabbed a couple screenshots from activity monitor to demonstrate that the memory usage is much higher in each of the child processes as well.
Packwerk v1.4 Memory Usage
Packwerk v2.0 Memory Usage
Version Information
Additional Context
I haven't looked into the changes yet, but I assume this is because packwerk now loads our rails application in each of its processes? I still wouldn't expect each child process's max RSS to balloon up so much.
Do you all have any idea why this might be? And secondly, any idea if it would be possible and reasonable to bring this number back down?
Thanks!
The text was updated successfully, but these errors were encountered: