Set ulimits for nofile in RabbitMQ and taiga-async services to avoid high CPU and memory OOM issues #153
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add ulimits for nofile to multiple services in docker-compose.
In recent distributions, docker sets max open files to "unlimited" (technically billions) as default, causing several issues for the RabbitMQ and taiga-async containers.
RabbitMQ
The ErlangVM allocates memory depending on max available file handles on startup. If ulimit is set to "unlimited", RabbitMQ takes 2GB RAM+ during startup, and settles around 1.6GB. With two RabbitMQs starting, this is a guaranteed OOM situation for any server with <8GB RAM (accounting for mem use of the other containers).
This can easily be investigated in the pod with
rabbitmq-diagnostics memory_breakdowninother_system.With ulimit set to "unlimited":
With ulimit set to "32786":
taiga-async
There seems to be an issue with a library used in the taiga-async stack that works itself through all available file handles during startup. If max open files to "unlimited", depending on CPU power, this can take hours during which the container is running at constant 100% CPU usage until it has worked through all file handles.
Setting sane
ulimitvalues for these containers resolves those problems, and also helps avoid many hard-to-debug issues people have been running into lately. Together with additional PR like #151 it should make the docker-compose deployment more reliable for self-hosters.After setting the
ulimitcorrectly, the whole stack can be run on a 2GB VPS as advertised.See also (among others):