Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError: Task was killed due to the node running low on memory. #11

Open
zcswdt opened this issue Jan 2, 2024 · 0 comments
Open

Comments

@zcswdt
Copy link

zcswdt commented Jan 2, 2024

Traceback (most recent call last):
File "run_sim.py", line 152, in
remaining_observations=remaining_observations)
File "/home/zcs/work/train-my-fling/flingbot/utils.py", line 416, in step_env
for obs, env_id in ray.get(step_retval):
File "/home/zcs/miniconda3/envs/flingbot/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/zcs/miniconda3/envs/flingbot/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get
raise value
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 192.168.0.107, ID: fc2befb2867ce88e73a8a45572c43a640751ae1f2b5e15bd8315f293) where the task (actor ID: f9cc340f5aef7b479d86345001000000, name=SimEnv.init, pid=4331, memory used=2.22GB) was running was 59.49GB / 62.58GB (0.950744), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: d98ac96cdd66ea8c0a2604609381c3256c8285b87822896c767f7714) because it was the most recently scheduled task; to see more information about memory usage on this node, use ray logs raylet.out -ip 192.168.0.107. To see the logs of the worker, use ray logs worker-d98ac96cdd66ea8c0a2604609381c3256c8285b87822896c767f7714*out -ip 192.168.0.107. Top 10 memory users: PID MEM(GB) COMMAND 7904 2.92 /home/zcs/work/software/pycharm-2023.2.5/jbr/bin/java -classpath /home/zcs/work/software/pycharm-202... 4312 2.22 ray::SimEnv 4331 2.22 ray::SimEnv 4253 2.17 ray::SimEnv 4288 2.15 ray::SimEnv 4252 2.15 ray::SimEnv 4268 2.14 ray::SimEnv.step 4302 2.13 ray::SimEnv.step 4279 2.13 ray::SimEnv.step 4296 2.12 ray::SimEnv Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. Set max_restarts and max_task_retries to enable retry when the task crashes due to OOM. To adjust the kill threshold, set the environment variable RAY_memory_usage_thresholdwhen starting Ray. To disable worker killing, set the environment variableRAY_memory_monitor_refresh_ms` to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant