At a random point in time stops responding on HTTP requests. #1399

fortyanov · 2019-10-31T09:57:07Z

Environment

PostgreSQL version: 12
PostgREST version: 6.0.2-713b214
Operating System: FreeBSD 12.1-STABLE

Description of the problem

At a random point in time, it stops responding to http requests. We tried the program assembled manually and the assembly from github, the problem occurs in both cases.

Our .conf file contains options
db-pool = 15
db-pool-timeout = 1000

I cannot provide more detailed information, because the logs do not contain information about the reasons for stopping the response to HTTP requests. When a problem occurs, reloading the database schema works fine. The only way to resume the process is to restart it.

In the near future I will attach TCP dump, as soon as problem repeats.

fortyanov · 2019-11-01T15:14:14Z

Here is TCP dump.
dump.pcap.zip

steve-chavez · 2019-11-02T21:40:22Z

I think this could be the same problem as #905. That issue is frequently visited, seems the error persists even on current versions.

Unfortunately, using tshark on the pcap file reveals nothing for me.

How about if you add this change steve-chavez@110314d and recompile from master? Perhaps that could lead to a more meaningful error message on the client.

fortyanov · 2019-11-06T08:50:48Z

I rebuilt the project with a patch 110314d but when problem repeats, I still have no log messages. My haskell is very bad. Can you say in which parts of project should I turn my attention.

steve-chavez · 2019-11-06T14:35:41Z

Oh, the patch only adds a client error message for when the empty reply happens again. I'd like to see what's the error message in that case.

steve-chavez · 2019-11-06T14:39:48Z

Also, could it be an out of memory issue?

fortyanov · 2019-11-06T14:59:13Z

Free memory is about 2 GB minimum. Daemon not returns any error messages and do not response on any HTTP requests and do not close TCP connections with clients.

fortyanov · 2019-11-08T08:19:51Z

We set parameter
db-pool-timeout = 1
and problem not repeats about 2 days. Hope this information will helps you.

fortyanov · 2019-11-11T14:19:42Z

Problem repeats again but not stable. Seems like the problem is somehow related to db-pool-timeout parameter. Do you know some more logging instruments to get more information?

steve-chavez · 2019-11-11T17:03:25Z

Hmm.. I'm trying to reproduce the issue. Is your pg server in the same box as postgrest?

This is what I'm doing. Have an application_name query param in the postgrest db-uri.

db-uri = "postgres://user@localhost/db?application_name=postgrest_app"
db-pool-timeout = 1000000

Then we can look at pg_stat_activiy:

select * from pg_stat_activity where application_name = 'postgrest_app';

And we can terminate the connection:

select pg_terminate_backend(pid) from pg_stat_activity where application_name = 'postgrest_app';

In this case the next request to postgrest will result in a 503 with the following message:

curl 'localhost:3000/'
{"details":"no connection to the server\n","message":"Database client error"}

And the server will log:

Attempting to connect to the database...
127.0.0.1 - - [11/Nov/2019:11:49:41 -0500] "GET / HTTP/1.1" 503 - "" "curl/7.64.0"
Connection successful

PostgREST tries to reconnect to the db with exponential backoff, and in my case the next request will succeed.

Even If I shut down the PostgreSQL server, PostgREST will print server logs when trying to recover:

127.0.0.1 - - [11/Nov/2019:11:52:19 -0500] "GET / HTTP/1.1" 503 - "" "curl/7.64.0"                                              [15/429]
{"details":"could not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (::1) and accepting\n\tTCP/IP
 connections on port 5432?\ncould not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (127.0.0.1) $
nd accepting\n\tTCP/IP connections on port 5432?\n","code":"","message":"Database connection error"}
Attempting to reconnect to the database in 0 seconds...
{"details":"could not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (::1) and accepting\n\tTCP/I$
 connections on port 5432?\ncould not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (127.0.0.1) $
nd accepting\n\tTCP/IP connections on port 5432?\n","code":"","message":"Database connection error"}
Attempting to reconnect to the database in 1 seconds...
127.0.0.1 - - [11/Nov/2019:11:52:21 -0500] "GET / HTTP/1.1" 503 - "" "curl/7.64.0"
{"details":"could not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (::1) and accepting\n\tTCP/I$
 connections on port 5432?\ncould not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (127.0.0.1) a
nd accepting\n\tTCP/IP connections on port 5432?\n","code":"","message":"Database connection error"}
Attempting to reconnect to the database in 2 seconds...

And clients would get 503 errors with this message:

curl localhost:3000
{"message":"Database connection lost, retrying the connection."}

Could you see if the recovery procedure is similar on your server? Maybe this would give another clue.

fortyanov · 2019-11-12T10:00:05Z

Yes, I have got exact same behavior on our server. Now I return db-pool-timeout=1000 and will disable keep-alive on nginx because there I have an assumption that this might be related to the problem.

fortyanov · 2019-11-18T08:26:55Z

In nginx configuration file

upstream postgrest
{
  server 127.0.0.1:3000;
  keepalive 64;
}

we disable keepalive parameter and now all works fine.

steve-chavez · 2019-11-18T14:18:43Z

@fortyanov Oh, great find!

So, we should remove that recommended keepalive from the hardening postgrest doc.

Why would that happen though? The nginx doc on keepalive says:

keepalive – The number of idle keepalive connections to an upstream server that remain open for each worker process. There is no default value.

fortyanov · 2019-11-18T14:57:15Z

I don't understand exact cause of such bug. I guess this can happening because some kind of race condition. Keepalive is very important for performance.

elimisteve · 2019-11-19T07:31:09Z

I will note that I've been using PostgREST for 3 years and I've never run into this issue. This may be worth mentioning because I always have a Go binary proxying through to PostgREST, not Nginx, which may be why!

fortyanov · 2019-11-19T08:57:13Z

I will note that I've been using PostgREST for 3 years and I've never run into this issue. This may be worth mentioning because I always have a Go binary proxying through to PostgREST, not Nginx, which may be why!

Do you have keepalive (persistent connection to postgrest) realization on your Go proxy?

elimisteve · 2019-11-19T09:18:30Z

I'm not sure.

Following the source, I see: https://github.com/EffectiveAF/effective/blob/master/server.go#L75 => https://github.com/golang/go/blob/master/src/net/http/httputil/reverseproxy.go#L100-L124 => https://github.com/golang/go/blob/master/src/net/http/httputil/reverseproxy.go#L24-L79 => https://github.com/golang/go/blob/master/src/net/http/transport.go#L37-L54 .

Perhaps most relevantly I see

Timeout: 30 * time.Second

KeepAlive: 30 * time.Second

and

type Transport struct {
        ...

    	// DisableKeepAlives, if true, disables HTTP keep-alives and
	// will only use the connection to the server for a single
	// HTTP request.
	//
	// This is unrelated to the similarly named TCP keep-alives.
        DisableKeepAlives bool
        ...

which defaults to false.

dwagin · 2019-11-19T13:33:59Z

yesodweb/wai#618

Maybe this

fortyanov · 2019-12-02T11:47:55Z

We setup unix socket on postrest

server-unix-socket = "/tmp/.s.PostgREST"
server-unix-socket-mode = "777"

Problem still happens.

When we setup NGINX keepalive_timeout value up

{
  server unix:/tmp/.s.PostgREST;
  keepalive 64;
  keepalive_timeout 120s;
}

to make it bigger then keep-alive in postgrest the problem not happens. Default keepalive timeout is 60 second in nginx and postgrest too (I think) . Seems like there is race condition on closing session between postgrest and nginx.

fortyanov · 2019-12-02T11:48:12Z

Im bad in haskell no there is could be information about library that used in postgrest
yesodweb/wai#734 (comment)

steve-chavez · 2019-12-04T23:34:44Z

I think for now we could recommend the keepalive_timeout > 60s in our nginx samples. Ideally we would output a clear error when this happens(not sure if possible).

@fortyanov Did the issue appear again? Also could you estimate what kind of traffic your server is handling? I'd still like to reproduce the issue.

steve-chavez · 2020-09-03T02:31:12Z

I've also seen this on a production instance. I tried the keepalive_timeout 120s with no success. I kept getting nginx upstream timeout errors.

Tuning the db-pool to higher values(up to 80, having pg max_connections=100) also didn't help.

Scenario was like this:

pg and pgrst on different servers.
210 concurrent requests arrive and pgrst stops responding(this gives the upstream timeout errors). About 5 mins later, pgrst responds to newly arrived 249 concurrent requests, then stops responding. 5~ mins later, responds to 199 concurrent requests and then stops responding again.
The above keeps happening with similar numbers, pgrst responds->stops responding(timeouts)->responds.
The immediate solution was to fix the client and reduce the number of requests.

I'm lead to believe that this is related to hasql-pool(nikita-volkov/hasql-pool#3, nikita-volkov/hasql-pool#4 (comment)). It could be that it's not erring when there's no more available connections.

ruslantalpa · 2020-09-03T08:06:07Z

It’s not postgrest or hasql or nginx. It’s the kernel settings for your tcp stack

Details on what to tune here https://stackoverflow.com/questions/40339547/periodic-drop-in-throughput-of-nginx-reverse-proxy-what-can-it-be/40356386#40356386

drTr0jan · 2021-05-18T13:32:21Z

My environment

FreeBSD 12.2;
PostgREST nightly (d3a8b5f) or 7.0.1-release.
PostgreSQL 13.2 + PgBouncer 1.14.0
Nginx 1.20.0

Not working configuration

nginx.conf:

upstream postgrest
{
  server unix:/tmp/.s.PostgREST;
  keepalive 64;
#  keepalive_timeout 60s; # assumed as default
}

Working configuration

nginx.conf:

upstream postgrest
{
  server unix:/tmp/.s.PostgREST;
  keepalive 64;
  keepalive_timeout 120s;
}

Investigation

I've investigated the problem much more.
The issue doesn't depend of socket type - AF_INET or AF_UNIX.

While local (unix) socket is used, PostgREST opens the socket after startup:

# sockstat -u
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
postgrest postgrest 18225 48 stream /tmp/.s.PostgREST

Trace of system calls on worked system is:

# truss -p 18225
...
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
_umtx_op(0x8032c46b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x802ae0f50,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9400,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9400,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
accept4(0x30,0x42003d99a0,0x42003d9a40,0x30000000) = 47 (0x2f)
fcntl(47,F_GETFD,)				 = 1 (0x1)
fcntl(47,F_SETFD,FD_CLOEXEC)			 = 0 (0x0)
setsockopt(47,IPPROTO_TCP,TCP_NODELAY,0x4200443908,4) ERR#22 'Invalid argument'
kevent(18,0x0,0,{ },64,{ 0.000000000 })		 = 0 (0x0)
mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34473328640 (0x806c54000)
accept4(0x30,0x4200443920,0x42004439c0,0x30000000) ERR#35 'Resource temporarily unavailable'
kevent(18,0x0,0,{ },64,{ 0.000000000 })		 = 0 (0x0)
kevent(23,{ 48,EVFILT_READ,EV_ADD|EV_ONESHOT,0,0,0x0 },1,0x0,0,{ 0.000000000 }) = 0 (0x0)
_umtx_op(0x8032c32b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
recvfrom(47,"GET / HTTP/1.1\r\nHost: postgres"...,16384,0,NULL,0x0) = 79 (0x4f)
_umtx_op(0x8048812b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
clock_gettime(4,{ 10091166.965890658 })		 = 0 (0x0)
_umtx_op(0x8048812b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9380,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9380,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
clock_gettime(0,{ 1620994043.980874065 })	 = 0 (0x0)
clock_gettime(4,{ 10091166.966852797 })		 = 0 (0x0)
sendto(49,"P\0\0\0006\0BEGIN ISOLATION LEVE"...,92,0,NULL,0) = 92 (0x5c)
nanosleep({ 0.010000000 })			 = 0 (0x0)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^Dn\0\0\0\^DC"...,16384,0,NULL,0x0) = 32 (0x20)
sendto(49,"P\0\0\^A$\0select set_config($1,"...,654,0,NULL,0) = 654 (0x28e)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^DT\0\0\0\M-n"...,16384,0,NULL,0x0) = 430 (0x1ae)
mmap(0x0,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34473349120 (0x806c59000)
mmap(0x0,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34473361408 (0x806c5c000)
sendto(49,"P\0\0\^E[\0\n    select\n      n"...,1438,0,NULL,0) = 1438 (0x59e)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^DT\0\0\0\M^C"...,16384,0,NULL,0x0) = 162 (0xa2)
mmap(0x0,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34473373696 (0x806c5f000)
sendto(49,"P\0\0\0\M-@\0\n      select\n   "...,259,0,NULL,0) = 259 (0x103)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^DT\0\0\0$\0"...,16384,0,NULL,0x0) = 78 (0x4e)
sendto(49,"P\0\0\a\M-O\0\n -- Recursively g"...,2066,0,NULL,0) = 2066 (0x812)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^DT\0\0\^A0\0"...,16384,0,NULL,0x0) = 988 (0x3dc)
sendto(49,"P\0\0\0\^N\0COMMIT\0\0\0B\0\0\0"...,52,0,NULL,0) = 52 (0x34)
poll({ 49/POLLIN|POLLERR },1,-1)		 = 1 (0x1)
recvfrom(49,"1\0\0\0\^D2\0\0\0\^Dn\0\0\0\^DS"...,16384,0,NULL,0x0) = 55 (0x37)
clock_gettime(0,{ 1620994044.198996088 })	 = 0 (0x0)
_umtx_op(0x8048812b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9300,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
clock_gettime(13,{ 1620994044.000000000 })	 = 0 (0x0)
_umtx_op(0x8048642b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032ed140,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
clock_gettime(4,{ 10091167.185940172 })		 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
clock_gettime(13,{ 1620994044.000000000 })	 = 0 (0x0)
clock_gettime(4,{ 10091167.186504214 })		 = 0 (0x0)
sendto(47,"HTTP/1.1 200 OK\r\nTransfer-Enco"...,3421,0,NULL,0) = 3421 (0xd5d)
recvfrom(47,0x806c2f5cf,16305,0,0x0,0x0)	 ERR#35 'Resource temporarily unavailable'
kevent(23,{ 47,EVFILT_READ,EV_ADD|EV_ONESHOT,0,0,0x0 },1,0x0,0,{ 0.000000000 }) = 0 (0x0)
_umtx_op(0x8048812b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032ed068,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9400,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9400,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c4bb8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032ed158,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
kevent(23,0x0,0,{ },64,{ 0.000000000 })		 = 0 (0x0)
kevent(23,0x0,0,{ },64,{ 0.000000000 })		 = 0 (0x0)
recvfrom(47,0x806c2f5cf,16305,0,NULL,0x0)	 = 0 (0x0)
close(47)					 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
...

But after issue (subject hang) PostgREST has left another FD:

# sockstat -u
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
postgrest postgrest 18225 48 stream /tmp/.s.PostgREST
postgrest postgrest 18225 52 stream /tmp/.s.PostgREST

and has stopped to processing request:

# truss -p 18225
...
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
clock_gettime(4,{ 10435412.577459186 })		 = 0 (0x0)
_umtx_op(0x8061e62b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9500,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8032c9500,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
clock_gettime(0,{ 1621338289.592394262 })	 = 0 (0x0)
clock_gettime(4,{ 10435412.578476639 })		 = 0 (0x0)
write(46,"\M^?",1)				 = 1 (0x1)
poll({ 43/POLLIN 45/POLLIN },2,-1)		 = 1 (0x1)
read(45,"\M^?",4096)				 = 1 (0x1)
clock_gettime(4,{ 10435412.579218358 })		 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
nanosleep({ 0.010000000 })			 = 0 (0x0)
...

# fstat -p 18225
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
772      postgrest  18225    0 /dev       29   crw-rw-rw-    null rw
772      postgrest  18225    1* pipe fffff802286a9d48 <-> fffff802286a9be0      0 rw
772      postgrest  18225    2* pipe fffff802286a9d48 <-> fffff802286a9be0      0 rw
772      postgrest  18225    3
772      postgrest  18225    4* pipe fffff8022860c2f8 <-> fffff8022860c460      0 rw
772      postgrest  18225    5* pipe fffff8022860c460 <-> fffff8022860c2f8      0 rw
772      postgrest  18225    6* pipe fffff802967f72f8 <-> fffff802967f7460      0 rw
772      postgrest  18225    7* pipe fffff802967f7460 <-> fffff802967f72f8      0 rw
772      postgrest  18225    8
772      postgrest  18225    9* pipe fffff801e9567000 <-> fffff801e9567168      0 rw
772      postgrest  18225   10* pipe fffff801e9567168 <-> fffff801e9567000      0 rw
772      postgrest  18225   11* pipe fffff8028c478be0 <-> fffff8028c478d48      0 rw
772      postgrest  18225   12* pipe fffff8028c478d48 <-> fffff8028c478be0      0 rw
772      postgrest  18225   13
772      postgrest  18225   14* pipe fffff802d46d1000 <-> fffff802d46d1168      0 rw
772      postgrest  18225   15* pipe fffff802d46d1168 <-> fffff802d46d1000      0 rw
772      postgrest  18225   16* pipe fffff802455debe0 <-> fffff802455ded48      0 rw
772      postgrest  18225   17* pipe fffff802455ded48 <-> fffff802455debe0      0 rw
772      postgrest  18225   18
772      postgrest  18225   19* pipe fffff802285fe5f0 <-> fffff802285fe758      0 rw
772      postgrest  18225   20* pipe fffff802285fe758 <-> fffff802285fe5f0      0 rw
772      postgrest  18225   21* pipe fffff8029031e000 <-> fffff8029031e168      0 rw
772      postgrest  18225   22* pipe fffff8029031e168 <-> fffff8029031e000      0 rw
772      postgrest  18225   23
772      postgrest  18225   24* pipe fffff802455dd2f8 <-> fffff802455dd460      0 rw
772      postgrest  18225   25* pipe fffff802455dd460 <-> fffff802455dd2f8      0 rw
772      postgrest  18225   26* pipe fffff800198552f8 <-> fffff80019855460      0 rw
772      postgrest  18225   27* pipe fffff80019855460 <-> fffff800198552f8      0 rw
772      postgrest  18225   28
772      postgrest  18225   29* pipe fffff801e94f6be0 <-> fffff801e94f6d48      0 rw
772      postgrest  18225   30* pipe fffff801e94f6d48 <-> fffff801e94f6be0      0 rw
772      postgrest  18225   31* pipe fffff802455b5be0 <-> fffff802455b5d48      0 rw
772      postgrest  18225   32* pipe fffff802455b5d48 <-> fffff802455b5be0      0 rw
772      postgrest  18225   33
772      postgrest  18225   34* pipe fffff801e9562be0 <-> fffff801e9562d48      0 rw
772      postgrest  18225   35* pipe fffff801e9562d48 <-> fffff801e9562be0      0 rw
772      postgrest  18225   36* pipe fffff8013a9c05f0 <-> fffff8013a9c0758      0 rw
772      postgrest  18225   37* pipe fffff8013a9c0758 <-> fffff8013a9c05f0      0 rw
772      postgrest  18225   38
772      postgrest  18225   39* pipe fffff800196775f0 <-> fffff80019677758      0 rw
772      postgrest  18225   40* pipe fffff80019677758 <-> fffff800196775f0      0 rw
772      postgrest  18225   41* pipe fffff80007ef75f0 <-> fffff80007ef7758      0 rw
772      postgrest  18225   42* pipe fffff80007ef7758 <-> fffff80007ef75f0      0 rw
772      postgrest  18225   43* pipe fffff802286a82f8 <-> fffff802286a8460      0 rw
772      postgrest  18225   44* pipe fffff802286a8460 <-> fffff802286a82f8      0 rw
772      postgrest  18225   45* pipe fffff8000f02f5f0 <-> fffff8000f02f758      0 rw
772      postgrest  18225   46* pipe fffff8000f02f758 <-> fffff8000f02f5f0      0 rw
772      postgrest  18225   48* local stream fffff8026d2e5200
772      postgrest  18225   52* local stream fffff800464f9100

steve-chavez · 2021-05-19T19:09:58Z

I've investigated the problem much more.
The issue doesn't depend of socket type - AF_INET or AF_UNIX.

Thanks a lot for helping here @drTr0jan

But after issue (subject hang) PostgREST has left another FD:

How did you managed to reproduce the hanging?

steve-chavez · 2022-02-09T22:47:07Z

The hint above suggests that this issue was related to file descriptors.

On #2042 we noted that PostgREST will stop processing new requests when EMFILE(Too many open files) is reached, a dependency was upgraded to fix this on #2158.

Assuming this issue will be fixed with the above change as well, so will close - but please reopen if the problem reappears.

kevinmlong · 2023-03-22T13:54:28Z

@steve-chavez - I'm running a self-hosted Supabase instance (with an nginx proxy in front) and running into a similar problem where we're seeing intermittent 503 errors when making requests from our clients.

I've posted a github issue with Supabase here and on their discord page here

Sounds like this may be related. I'd appreciate your insight on this issue we're experiencing.

steve-chavez · 2023-03-23T04:18:16Z

@kevinmlong That looks unrelated to this issue since you're getting a response. I've made a suggestion on supabase/supabase#12918 (comment).

This was referenced Nov 18, 2019

curl: (52) Empty reply from server #905

Closed

Remove keepalive nginx recommendation PostgREST/postgrest-docs#281

Merged

wolfgangwalther mentioned this issue Dec 6, 2020

[for discussion] Integrate a load test into the local dev tools and CI #1609

Closed

5 tasks

wolfgangwalther mentioned this issue Mar 16, 2021

multi app connect postgrest to operater postgres, after a few days, the postgrest is blocked, no connection to the postgres is there #1765

Closed

steve-chavez mentioned this issue Nov 24, 2021

Network.Socket.accept: resource exhausted (Too many open files) #2042

Closed

steve-chavez closed this as completed Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At a random point in time stops responding on HTTP requests. #1399

At a random point in time stops responding on HTTP requests. #1399

fortyanov commented Oct 31, 2019

fortyanov commented Nov 1, 2019

steve-chavez commented Nov 2, 2019 •

edited

Loading

fortyanov commented Nov 6, 2019

steve-chavez commented Nov 6, 2019

steve-chavez commented Nov 6, 2019

fortyanov commented Nov 6, 2019 •

edited

Loading

fortyanov commented Nov 8, 2019

fortyanov commented Nov 11, 2019

steve-chavez commented Nov 11, 2019

fortyanov commented Nov 12, 2019

fortyanov commented Nov 18, 2019

steve-chavez commented Nov 18, 2019

fortyanov commented Nov 18, 2019

elimisteve commented Nov 19, 2019

fortyanov commented Nov 19, 2019 •

edited

Loading

elimisteve commented Nov 19, 2019

dwagin commented Nov 19, 2019 •

edited

Loading

fortyanov commented Dec 2, 2019

fortyanov commented Dec 2, 2019 •

edited

Loading

steve-chavez commented Dec 4, 2019

steve-chavez commented Sep 3, 2020

ruslantalpa commented Sep 3, 2020

drTr0jan commented May 18, 2021

steve-chavez commented May 19, 2021

steve-chavez commented Feb 9, 2022

kevinmlong commented Mar 22, 2023

steve-chavez commented Mar 23, 2023

At a random point in time stops responding on HTTP requests. #1399

At a random point in time stops responding on HTTP requests. #1399

Comments

fortyanov commented Oct 31, 2019

Environment

Description of the problem

fortyanov commented Nov 1, 2019

steve-chavez commented Nov 2, 2019 • edited Loading

fortyanov commented Nov 6, 2019

steve-chavez commented Nov 6, 2019

steve-chavez commented Nov 6, 2019

fortyanov commented Nov 6, 2019 • edited Loading

fortyanov commented Nov 8, 2019

fortyanov commented Nov 11, 2019

steve-chavez commented Nov 11, 2019

fortyanov commented Nov 12, 2019

fortyanov commented Nov 18, 2019

steve-chavez commented Nov 18, 2019

fortyanov commented Nov 18, 2019

elimisteve commented Nov 19, 2019

fortyanov commented Nov 19, 2019 • edited Loading

elimisteve commented Nov 19, 2019

dwagin commented Nov 19, 2019 • edited Loading

fortyanov commented Dec 2, 2019

fortyanov commented Dec 2, 2019 • edited Loading

steve-chavez commented Dec 4, 2019

steve-chavez commented Sep 3, 2020

ruslantalpa commented Sep 3, 2020

drTr0jan commented May 18, 2021

My environment

Not working configuration

Working configuration

Investigation

steve-chavez commented May 19, 2021

steve-chavez commented Feb 9, 2022

kevinmlong commented Mar 22, 2023

steve-chavez commented Mar 23, 2023

steve-chavez commented Nov 2, 2019 •

edited

Loading

fortyanov commented Nov 6, 2019 •

edited

Loading

fortyanov commented Nov 19, 2019 •

edited

Loading

dwagin commented Nov 19, 2019 •

edited

Loading

fortyanov commented Dec 2, 2019 •

edited

Loading