-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage / low loading performance #884
Comments
Maybe @franzlst @sebastian-brunner @Rbelder have some useful input on this? |
I have not a lot of time right now but we have had digged into this topic already like 6 Years ago. There have to be open and closed issues surrounding this topic. Please search and look into the issues closed or still open. We made similar observations and much different you made right now but the code has not changed a lot. E.g. using your local ssd or the network (for only core usage) or loading only core or with gui made a big difference even after we made improved to the situation. Therefore, we introduced the depths limit for mvc model generation. The used mvc pattern library generates many linkage and list entries what eats time. Therefore generally you could not look into library-state (see the inner child states) except you enable a feature. So maybe you made your test with a statemachine that has library-state all over in its root-state. The memory debris/garbage is something we tackled with some unittest using gc. Yes there are some parts that stay but of your 2GB everything more the 1MB should be gone. Try to fit your example into the respective unit-test and check which objects remain. Anyway it is of interest if you can still run all of the test. Maybe a refactoring introduced a bug. In general, in the past we made the observation that the usage of 100k or more states in a statemachine will not work without a kind of dynamic loading of parts of the statemachine and decided that this is a too hot feature (we made some trails) and at some point also bad-practice in use of RAFCON. About the flat statemachine load feature by one file I re-comment Sebastian (@sebastian-brunner). I think they used internally a kind of plugin or at least were heading for this ability at Agile. Or it was even used in the robotic institute like 4-5 Years ago. there is more to say from findings in the past .... |
Thank you for the quick response! I have already searched through issues in this repo concerning the problem and inspected the ones listed above. They will definitely give some useful hints where to look.
For me the changes in these configurations were not that significant, although they definitely have some impact. However, as mentioned above, we are not looking for a factor of 10-50% slower/faster but rather try to find the significant discrepancy between the actual data that is stored in
I definitely should check this more in detail, after all the data should be released when the state is closed. Regarding the tests, I actually ran all the unit tests (and updated a lot of deprecations and bugs that happened during the refactoring) before publishing this new release (2.1.1). I also stumbled across this
This is the real scope of the problem we are discussing at the institute right now. We will probably soon hit the first state machines with 100k states which emphasizes this problem again. But it's some very valuable insight for me that you came to this conclusion in the past, so thanks again!
I couldn't find any traces of this in the current version or any pull request from agile (as commented above in the respective pull request). Unfortunately, I don't have any more information on this right now. While this would most likely make some improvement, I don't think it would significantly change anything regarding the memory problem. |
Some how I mixed up my memories from the past. The loading from file we already tackled (so ssd versus network) by the following issue Preload libraries and use copy method to provide LibraryStates and respective models. You could check this by monitoring your memory consumption over multiple re-openings of a big state machine and destruction of those in a single RAFCON-Instance. Anyway, I would recommend making a performance test, which does these checks automatically. Otherwise, you will not have clear measurements and waste a lot of time doing it manually.
I know, it needs time to write a test but whitout it you will not get permanent improvements and put a lot of time into manual measures. Most likely you already started this with your comments from above. |
If you opened only one statemachine, the state machine manager as well as the execution engine (ExEngine) has the handle on this one state machine. If you open 2 or 3 state machine the ExEngine hold still one handle on one of those 3 state machine but the state machine manager holds handles on all state machines and will be bigger then the ExEngine. Anyway If you use this kind of measurement you have to be aware that every state holds the handle to it parent and maybe linkage following in pympler will label the object bigger then it is. |
Thanks, I will definitely look into that.
Yes I agree with you that a proper test is needed when working on this problem. For now, I was still trying to figure out if the problem is more some kind of bug and was more looking through the code to find something that might hint to the problem. It's still strange to me that, although the state machine info from the
This experiment was more to figure out if I can find some variable that holds significant memory. But to my current understanding it's rather that a single state machine (i.e. just a execution state) is already way to big when loaded as a python object. This then leads to the high memory when 100k+ states are used (as it scales somewhat linearly with the amount of states). Personally, I would first try to tackle this memory issue and work on the loading times afterwards (as they might be connected anyways). In any case, if the loading time is high it is not as big a problem as the general memory consumption for us right now. |
Thx for the investigation @JohannesErnst Concerning loading times:
Concerning memory consumption during execution: if you disable the execution history the memory won't build up anymore in the latest versions. You can opt to also only write the history to a file (FILE_SYSTEM_EXECUTION_HISTORY_ENABLE: false). |
@sebastian-brunner thanks for the reply and infos! I think regarding the configurations we already use almost optimal settings (except for Maybe we will decide to pursue this idea of dynamic loading as it is the most promising approach to make some fundamental improvements right now. But it will also consume quite some time so we will have to decide how urgent it is. Anyways, I will consider all your three suggestions, thanks again! |
This might also be useful information: #954 |
With larger and larger state machines (10k - 1Mio states) we are running into notable problems regarding the loading time performance and, more importantly, memory usage. As it can be expected that the number of states will even increase for future autonomous tasks, this is important to tackle.
As the RAM of the robots is limited on most systems, consuming multiple GBs of memory for only loading a large state machine is problematic.
The real issue is, that the memory consumption is 100-1000 times higher than the inherent information, regarding just the
.json
files loaded.To my current understanding, this is due to the object structure that is set up in python variables/references during runtime.
General Questions:
Connected to this issue might be the following topics:
We already had the following findings:
.json
) are rather tiny, whole state machines has 65MBrafcon_singletons
andthreading
leading to thestate objects
The text was updated successfully, but these errors were encountered: