-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At a random point in time stops responding on HTTP requests. #1399
Comments
Here is TCP dump. |
I think this could be the same problem as #905. That issue is frequently visited, seems the error persists even on current versions. Unfortunately, using How about if you add this change steve-chavez@110314d and recompile from master? Perhaps that could lead to a more meaningful error message on the client. |
I rebuilt the project with a patch 110314d but when problem repeats, I still have no log messages. My haskell is very bad. Can you say in which parts of project should I turn my attention. |
Oh, the patch only adds a client error message for when the empty reply happens again. I'd like to see what's the error message in that case. |
Also, could it be an out of memory issue? |
Free memory is about 2 GB minimum. Daemon not returns any error messages and do not response on any HTTP requests and do not close TCP connections with clients. |
We set parameter |
Problem repeats again but not stable. Seems like the problem is somehow related to db-pool-timeout parameter. Do you know some more logging instruments to get more information? |
Hmm.. I'm trying to reproduce the issue. Is your pg server in the same box as postgrest? This is what I'm doing. Have an db-uri = "postgres://user@localhost/db?application_name=postgrest_app"
db-pool-timeout = 1000000 Then we can look at select * from pg_stat_activity where application_name = 'postgrest_app'; And we can terminate the connection: select pg_terminate_backend(pid) from pg_stat_activity where application_name = 'postgrest_app'; In this case the next request to postgrest will result in a 503 with the following message:
And the server will log:
PostgREST tries to reconnect to the db with exponential backoff, and in my case the next request will succeed. Even If I shut down the PostgreSQL server, PostgREST will print server logs when trying to recover:
And clients would get 503 errors with this message:
Could you see if the recovery procedure is similar on your server? Maybe this would give another clue. |
Yes, I have got exact same behavior on our server. Now I return db-pool-timeout=1000 and will disable keep-alive on nginx because there I have an assumption that this might be related to the problem. |
In nginx configuration file
we disable keepalive parameter and now all works fine. |
@fortyanov Oh, great find! So, we should remove that recommended Why would that happen though? The nginx doc on
|
I don't understand exact cause of such bug. I guess this can happening because some kind of race condition. Keepalive is very important for performance. |
I will note that I've been using PostgREST for 3 years and I've never run into this issue. This may be worth mentioning because I always have a Go binary proxying through to PostgREST, not Nginx, which may be why! |
Do you have keepalive (persistent connection to postgrest) realization on your Go proxy? |
I'm not sure. Following the source, I see: https://github.com/EffectiveAF/effective/blob/master/server.go#L75 => https://github.com/golang/go/blob/master/src/net/http/httputil/reverseproxy.go#L100-L124 => https://github.com/golang/go/blob/master/src/net/http/httputil/reverseproxy.go#L24-L79 => https://github.com/golang/go/blob/master/src/net/http/transport.go#L37-L54 . Perhaps most relevantly I see
and
which defaults to |
Maybe this |
We setup unix socket on postrest
Problem still happens. When we setup NGINX keepalive_timeout value up
to make it bigger then keep-alive in postgrest the problem not happens. Default keepalive timeout is 60 second in nginx and postgrest too (I think) . Seems like there is race condition on closing session between postgrest and nginx. |
Im bad in haskell no there is could be information about library that used in postgrest |
I think for now we could recommend the @fortyanov Did the issue appear again? Also could you estimate what kind of traffic your server is handling? I'd still like to reproduce the issue. |
I've also seen this on a production instance. I tried the Tuning the Scenario was like this:
I'm lead to believe that this is related to |
It’s not postgrest or hasql or nginx. It’s the kernel settings for your tcp stack Details on what to tune here https://stackoverflow.com/questions/40339547/periodic-drop-in-throughput-of-nginx-reverse-proxy-what-can-it-be/40356386#40356386 |
My environment
Not working configurationnginx.conf:
Working configurationnginx.conf:
InvestigationI've investigated the problem much more. While local (unix) socket is used, PostgREST opens the socket after startup:
Trace of system calls on worked system is:
But after issue (subject hang) PostgREST has left another FD:
and has stopped to processing request:
|
Thanks a lot for helping here @drTr0jan
How did you managed to reproduce the hanging? |
The hint above suggests that this issue was related to file descriptors. On #2042 we noted that PostgREST will stop processing new requests when EMFILE(Too many open files) is reached, a dependency was upgraded to fix this on #2158. Assuming this issue will be fixed with the above change as well, so will close - but please reopen if the problem reappears. |
@steve-chavez - I'm running a self-hosted Supabase instance (with an nginx proxy in front) and running into a similar problem where we're seeing intermittent 503 errors when making requests from our clients. I've posted a github issue with Supabase here and on their discord page here Sounds like this may be related. I'd appreciate your insight on this issue we're experiencing. |
@kevinmlong That looks unrelated to this issue since you're getting a response. I've made a suggestion on supabase/supabase#12918 (comment). |
Environment
Description of the problem
At a random point in time, it stops responding to http requests. We tried the program assembled manually and the assembly from github, the problem occurs in both cases.
Our .conf file contains options
db-pool = 15
db-pool-timeout = 1000
I cannot provide more detailed information, because the logs do not contain information about the reasons for stopping the response to HTTP requests. When a problem occurs, reloading the database schema works fine. The only way to resume the process is to restart it.
In the near future I will attach TCP dump, as soon as problem repeats.
The text was updated successfully, but these errors were encountered: