-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The ability to define GUNICORN_WORKERS
and CONN_MAX_AGE
env vars via helm chart's values
#74
Comments
So do you need to change CONN_MAX_AGE? I'm not sure why It's clear at least that you probably don't really want 161 workers on a 24-CPU controller node, unless it's intended to serve a lot of devs concurrently... but I'm not sure how changing CONN_MAX_AGE will help anything. You need to reduce the number of workers, or increase the number of allowed connections. It makes sense to me that, any workers that can't get a database handle cannot help at all. Timing them out won't make things better (well, maybe that's wrong... even with 161 healthy workers, you probably wouldn't want 161 idle database connections to remain persistent at all times unless you actually had 161 currently active workers serving client requests.) |
I don't see any reason we can't add both of these to controller params, I'm just not sure if we should also add a configuration option for the number of connections allowed to the database while we're in there. (You might actually want to serve thousands of concurrent app developers, and maybe the default value of GUNICORN_WORKERS is actually what you wanted.) Probably that is a separate issue, though. |
@Bregor So you are currently setting using ENV vars on the deis-controller deployment object to fix this issue? We should alternately allow users to be able to define However, are the default values set correct in most cases when there are not that many users? |
Yes. Deis database is a server with only one client (deis-controller), so this client should respect its settings. There should not be possible to run more gunicorn workers than
TL;DR: for now nodes can not contains more than 24 CPU It's pretty easy to count:
This way every 5 seconds new worker became alive and is trying to run |
Do you think this is an important knob? I kind of think we should just set a cap for the maximum allowed value of Been reading this: https://github.com/concourse/concourse/wiki/Anti-Patterns#knobs |
I'd guess 64 as a reasonable high-end limit. If we're making it tuneable, it should be tuneable both ways. |
(Personally I use deis-database for something other than deis-controller so I don't want it to consume all 100 connections, if the limit is 100. I know that's against the advice, but I'm not sure it is documented anywhere as such...) |
Doesn't |
Tuneable |
I'm just trying to avoid adding an unnecessary knob, because it's possible to tune a knob wrong. I'm not actually in favor of adding either knob, if we can simply write a rule that will guarantee it never fails on machines with >24 CPUs. That's a good point though, we can't always guarantee max_connections will be defaulted to 100, from a quick Google it actually looks like the default setting of uhh, parameterized[1] based on the amount of memory in your database? Well that's super... my idea is therefore out the window. There is no rule we could write that is guaranteed to be correct, not at least without additional knowledge about your off-cluster databases, if that's the case. I'll consider that as due-diligence done, I'm satisfied that we should add a way to tune |
started in #75 |
I haven't tested teamhephy/workflow#66 yet, but it should work |
Following this code without ability to set
GUNICORN_WORKERS
we have number of workers equal tonumber_of_cpus*4+1
.Django's default behaviour is to open connection per worker for
GUNICORN_WORKERS
times and keep them opened forCONN_MAX_AGE
seconds.CONN_MAX_AGE
is strictly set right in code for 600 seconds andmax_connections
parameter of PostgreSQL indeis-database
strictly set to100
too.So, lets imagine, we have a big metal chassis with 40 CPUs.
Our
GUNICORN_WORKERS
variable will be equal to40*4+1=161
(which is already greater than 100), also, we haveCONN_MAX_AGE
parameter equal to 600 seconds.This way after 500 seconds (health check is once per 5 seconds, 100 times) queries like
SELECT 0
which are results of health check will own all 100 connections to PostgreSQL.The text was updated successfully, but these errors were encountered: