Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU utilization with large collection of past engagements #3008

Closed
timbrigham-oc opened this issue Jun 26, 2024 · 3 comments
Closed

High CPU utilization with large collection of past engagements #3008

timbrigham-oc opened this issue Jun 26, 2024 · 3 comments

Comments

@timbrigham-oc
Copy link
Contributor

Description
My Ubuntu instance is seeing high CPU utilization from the Python instance running Caldera.
This gets much more noticeable when there are a substantial number (~50 previously run operations in my testing) present, to the point where I get agent communication timeouts.

To Reproduce

  1. Run multiple operations, at least some of which have 100+ steps.
  2. Continue running / rerunning operations.
  3. Over time the CPU utilization creeps up, and will eventually peg a single CPU core at 100%.
  4. This drastically slows down responses, and can result in communication timeouts.

Testing
Restarting the Caldera server process does not help, and will (fairly quickly) return to the same CPU utilization patterns.
I have a small API script which lets me bulk remove previous operations.. Removing old runs decreases CPU utilization.

Expected behavior
Formerly executed engagements should not have an impact on CPU utilization for ongoing processes.
I am guessing that while the operation is being executed links from former operations are still being evaluated and consuming CPU cycles, or something similar.

Environment
My test instance is based on the 5.0.0 tagged release, and includes a few customizations -
mitre/magma#55
mitre/magma#53
mitre/magma#60

@elegantmoose
Copy link
Contributor

elegantmoose commented Jul 1, 2024

Hmm, I wonder if this it hitting the limits of the in-memory simple"database" Caldera uses. Do you have any profiling stats on the memory usage as well? I wondering if its constantly page swapping RAM.

*Ill admit, I dont think we have ever 50+ operations at 100+ steps.

@timbrigham-oc
Copy link
Contributor Author

Yeah, I could see that being a limiting factor. It's only (unusably) sluggish when there is an active operation and a bunch of historical data. I'm pretty sure my memory utilization was under 20% when I viewed it in top but no screenshot for proof. :)

It's also definitely something single threaded in Python that's getting caught up. Only one of the multiple cores in my test instance will get pegged to 100%. Didn't make sense at first since the two core machine was only reporting ~55% total in the Azure console.

I'll include more details when I end back up in the same situation. Gotta love iterative development on a process that uses lateral movement.. Blows up these counts in a hurry.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants