Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(runtime): use prlimit to limit resource usage of command to avoid OOM Runtime Kill #6338

Merged
merged 43 commits into from
Feb 11, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 18, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR implements limit for all commands started within tmux so that we can avoid agents running commands that consume too much memory, which eventually can kill the whole runtime.

This PR also add tests for this.

It can stop a stress test that tries to allocate 6G when we set a RUNTIME_MAX_MEMORY_GB of 3 GB
image

When increase RUNTIME_MAX_MEMORY_GB to 7GB, the same test will work:
image


Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:679d27a-nikolaik   --name openhands-app-679d27a   docker.all-hands.dev/all-hands-ai/openhands:679d27a

@mamoodi
Copy link
Collaborator

mamoodi commented Feb 3, 2025

@xingyaoww are you still diagnosing the issues?

@xingyaoww
Copy link
Collaborator Author

@mamoodi yes!

@xingyaoww xingyaoww changed the title [DRAFT] Bash performance diagnose feat(runtime): use prlimit to limit resource usage of command to avoid OOM Runtime Kill Feb 10, 2025
@xingyaoww xingyaoww marked this pull request as ready for review February 10, 2025 21:47
@xingyaoww xingyaoww requested review from rbren and neubig February 10, 2025 21:47
@neubig
Copy link
Contributor

neubig commented Feb 10, 2025

I confirmed that this works well locally!

Screenshot 2025-02-10 at 5 38 52 PM

Now when a process exceeds the expected amount of memory, it dies and the agent can realize that it died due to memory issues.

@xingyaoww
Copy link
Collaborator Author

@neubig comment should be resolved now :)

@xingyaoww xingyaoww requested a review from neubig February 11, 2025 02:46
@xingyaoww xingyaoww merged commit 6a6dc93 into main Feb 11, 2025
14 checks passed
@xingyaoww xingyaoww deleted the xw/bash-perf branch February 11, 2025 03:21
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Mar 3, 2025
…oid OOM Runtime Kill (All-Hands-AI#6338)

Co-authored-by: openhands <[email protected]>
Co-authored-by: Engel Nyst <[email protected]>
Co-authored-by: Graham Neubig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants