High CPU utilization with large collection of past engagements #3008

timbrigham-oc · 2024-06-26T15:10:50Z

Description
My Ubuntu instance is seeing high CPU utilization from the Python instance running Caldera.
This gets much more noticeable when there are a substantial number (~50 previously run operations in my testing) present, to the point where I get agent communication timeouts.

To Reproduce

Run multiple operations, at least some of which have 100+ steps.
Continue running / rerunning operations.
Over time the CPU utilization creeps up, and will eventually peg a single CPU core at 100%.
This drastically slows down responses, and can result in communication timeouts.

Testing
Restarting the Caldera server process does not help, and will (fairly quickly) return to the same CPU utilization patterns.
I have a small API script which lets me bulk remove previous operations.. Removing old runs decreases CPU utilization.

Expected behavior
Formerly executed engagements should not have an impact on CPU utilization for ongoing processes.
I am guessing that while the operation is being executed links from former operations are still being evaluated and consuming CPU cycles, or something similar.

Environment
My test instance is based on the 5.0.0 tagged release, and includes a few customizations -
mitre/magma#55
mitre/magma#53
mitre/magma#60

elegantmoose · 2024-07-01T20:37:39Z

Hmm, I wonder if this it hitting the limits of the in-memory simple"database" Caldera uses. Do you have any profiling stats on the memory usage as well? I wondering if its constantly page swapping RAM.

*Ill admit, I dont think we have ever 50+ operations at 100+ steps.

timbrigham-oc · 2024-07-01T20:56:40Z

Yeah, I could see that being a limiting factor. It's only (unusably) sluggish when there is an active operation and a bunch of historical data. I'm pretty sure my memory utilization was under 20% when I viewed it in top but no screenshot for proof. :)

It's also definitely something single threaded in Python that's getting caught up. Only one of the multiple cores in my test instance will get pegged to 100%. Didn't make sense at first since the two core machine was only reporting ~55% total in the Azure console.

I'll include more details when I end back up in the same situation. Gotta love iterative development on a process that uses lateral movement.. Blows up these counts in a hurry.

github-actions · 2024-08-31T00:20:45Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

timbrigham-oc added the bug label Jun 26, 2024

timbrigham-oc assigned elegantmoose Jun 26, 2024

github-actions bot added the no-issue-activity label Aug 31, 2024

github-actions bot closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU utilization with large collection of past engagements #3008

High CPU utilization with large collection of past engagements #3008

timbrigham-oc commented Jun 26, 2024

elegantmoose commented Jul 1, 2024 •

edited

Loading

timbrigham-oc commented Jul 1, 2024

github-actions bot commented Aug 31, 2024

High CPU utilization with large collection of past engagements #3008

High CPU utilization with large collection of past engagements #3008

Comments

timbrigham-oc commented Jun 26, 2024

elegantmoose commented Jul 1, 2024 • edited Loading

timbrigham-oc commented Jul 1, 2024

github-actions bot commented Aug 31, 2024

elegantmoose commented Jul 1, 2024 •

edited

Loading