Unusually high latency on Kubernetes cluster #2315

taufik-rama · 2022-06-13T07:37:17Z

taufik-rama
Jun 13, 2022

Environment

PostgreSQL version: Docker docker.io/bitnami/postgresql:11.7.0-debian-10-r9
PostgREST version: Docker postgrest/postgrest:v9.0.0
Operating system: Kubernetes cluster

Description of issue

Hello 👋

I've been debugging this issue & I think this is related to just the PostgREST, and not with other configuration. I've deployed other service just fine on currently running cluster.

Currently I'm trying to integrate the project with my current custom CRUD/REST backend system. I'm currently having an issue of the service having unusually (very) high latency.

Sample:

root@test-client:/# time curl -I postgrest.postgrest # Host for the postgrest service
HTTP/1.1 200 OK
Date: Mon, 13 Jun 2022 07:12:34 GMT
Server: postgrest/9.0.0
Content-Type: application/openapi+json; charset=utf-8


real	0m33.161s
user	0m0.006s
sys	0m0.009s

And when I tried to just request it a couple more times, it would then be a more reasonable latency

root@test-client:/# time curl -I postgrest.postgrest
HTTP/1.1 200 OK
Date: Mon, 13 Jun 2022 07:12:34 GMT
Server: postgrest/9.0.0
Content-Type: application/openapi+json; charset=utf-8


real	0m33.161s
user	0m0.006s
sys	0m0.009s

root@test-client:/# time curl -I postgrest.postgrest
HTTP/1.1 200 OK
Date: Mon, 13 Jun 2022 07:14:51 GMT
Server: postgrest/9.0.0
Content-Type: application/openapi+json; charset=utf-8


real	0m32.881s
user	0m0.007s
sys	0m0.007s

root@test-client:/# time curl -I postgrest.postgrest
HTTP/1.1 200 OK
Date: Mon, 13 Jun 2022 07:14:55 GMT
Server: postgrest/9.0.0
Content-Type: application/openapi+json; charset=utf-8


real	0m0.049s
user	0m0.007s
sys	0m0.007s

root@test-client:/# time curl -I postgrest.postgrest
HTTP/1.1 200 OK
Date: Mon, 13 Jun 2022 07:14:57 GMT
Server: postgrest/9.0.0
Content-Type: application/openapi+json; charset=utf-8


real	0m0.029s
user	0m0.007s
sys	0m0.005s

Logs

// ...
10.28.0.105 - - [13/Jun/2022:07:12:34 +0000] "HEAD / HTTP/1.1" 200 - "" "curl/7.83.1"
10.28.0.105 - - [13/Jun/2022:07:14:51 +0000] "HEAD / HTTP/1.1" 200 - "" "curl/7.83.1"
10.28.0.105 - - [13/Jun/2022:07:14:55 +0000] "HEAD / HTTP/1.1" 200 - "" "curl/7.83.1"
10.28.0.105 - - [13/Jun/2022:07:14:57 +0000] "HEAD / HTTP/1.1" 200 - "" "curl/7.83.1"

Is there a way to get PostgREST to be more "verbose"? I'm currently trying to find the root cause on what causes such latency.

Maybe this is related to the resource usage. Is there a minimum recommended spec for how much should PostgREST uses CPU/memory? Currently I limit the usage to 500m CPU (half of single cpu schedule) and 1Gb memory usage. Currently there's no traffic coming to the service yet, so the resource should be enough for me testing the service

The backing DB is already used by traffic, but I don't think that would causes the latency (on above I've only test using HEAD requests on the OpenAPI root path)

Answered by taufik-rama

Jun 14, 2022

No luck, the pattern is similar to current behaviour

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m34.076s
user	0m0.005s
sys	0m0.025s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m32.470s
user	0m0.005s
sys	0m0.011s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m0.019s
user	0m0.004s
sys	0m0.007s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m0.046s
user	0…

View full answer

steve-chavez · 2022-06-13T18:44:54Z

steve-chavez
Jun 13, 2022
Maintainer

I don't know about Kubernetes unfortunately. But on a lower end AWS t3a.nano instance(2 CPUs, 0.5 GB memory), I got a 32.52ms p95 latency while load testing from another instance in the same region. These were simple read requests(GET /tbl?id=eq.x)

(on above I've only test using HEAD requests on the OpenAPI root path)

Maybe try with simple read requests, OpenAPI could be slow if the db is complex.

0 replies

taufik-rama · 2022-06-14T00:58:30Z

taufik-rama
Jun 14, 2022
Author

No luck, the pattern is similar to current behaviour

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m34.076s
user	0m0.005s
sys	0m0.025s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m32.470s
user	0m0.005s
sys	0m0.011s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m0.019s
user	0m0.004s
sys	0m0.007s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data": "..."]
real	0m0.046s
user	0m0.006s
sys	0m0.006s

The pattern is currently: 30s latency for few first request, and then it would be fine, but if I wait for a minute or two, then it would regress again. My suspect is that this is simply because of cache

I guess the issue that I'm trying to trace is how did the latency get so high in the first place

users_db=# EXPLAIN SELECT * FROM users WHERE id = '{user-id}'; -- The query should be fine, it basically takes no time
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using users_pkey on users  (cost=0.29..8.31 rows=1 width=1237)
   Index Cond: (id = '{user-id}'::uuid)
(2 rows)

I don't know if this might help, but what process/steps does PostgREST make whenever a request came in? maybe I can extrapolate/guess based on that to see if there's a specific part of the system resources to be adjusted. I still suspect that it might not play nice with how the CPU is scheduled on kubernetes

5 replies

steve-chavez Jun 14, 2022
Maintainer

The pattern is currently: 30s latency for few first request, and then it would be fine, but if I wait for a minute or two, then it would regress again.

Check https://postgrest.org/en/stable/configuration.html#db-pool-timeout, by default it's 10 seconds. Try setting it to a higher value.

taufik-rama Jun 14, 2022
Author

Thanks! I think that's the issue, it seems that the DB connection initialization was the root cause

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data":"..."}]
real	0m23.629s
user	0m0.007s
sys	0m0.020s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data":"..."}]
real	0m0.026s
user	0m0.007s
sys	0m0.005s

root@test-ruff-client:/# time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data":"..."}]
real	0m0.018s
user	0m0.002s
sys	0m0.009s

root@test-ruff-client:/# sleep 120; time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data":"..."}]
real	0m0.035s
user	0m0.005s
sys	0m0.008s

root@test-ruff-client:/# 
root@test-ruff-client:/# sleep 300; time curl -XGET postgrest.postgrest/users-public?id=eq.{user-id}
[{"id":"{user-id}","data":"..."}]
real	0m22.752s
user	0m0.004s
sys	0m0.012s

Currently the configuration is

data:
  app.server-port: "3000"
  app.db-schemas: "api"
  app.db-anon-role: "anonymous"
  app.db-max-rows: "60"
  app.openapi-mode: "ignore-privileges" # Open all API schema definitions for public
  app.log-level: info
  app.db-pool-timeout: "300"

I think currently I'm fine with disabling the idle connection timeout, but is there a way to do that? I can't find it through quick-skimming the documentation

If not, I think I can probably try to use pgbouncer for the connection pooler, but if I can I'd prefer to keep things simple for now

steve-chavez Jun 14, 2022
Maintainer

Thanks! I think that's the issue, it seems that the DB connection initialization was the root cause

Cool. I didn't know the connection time could be that high, I guess it depends on the environment.

I think currently I'm fine with disabling the idle connection timeout, but is there a way to do that?

No, you could file a feature request for that. As a workaround you could set a big timeout.

If not, I think I can probably try to use pgbouncer for the connection pooler

Ah, I wouldn't recommend that. Even with 2 pgbouncers(since it's single threaded), we got bad performance compared to our pool. See #2294 (comment). We also need an additional feature for pgbouncer to work correctly, mentioned in #2294 (comment)

taufik-rama Jun 14, 2022
Author

I tried to time the connection via psql (though it's kinda "primitive" I guess haha)

root@test-ruff-client:/# time psql "postgres://..."
psql (14.3 (Ubuntu 14.3-1), server 11.7)
Type "help" for help.

db=> exit

real	0m0.593s
user	0m0.046s
sys	0m0.016s

I think there shouldn't be any issue during connection initialization

taufik-rama Jun 14, 2022
Author

Seems to work for me for now, increasing the idle timeout

Maybe next weekend I can try to create a PR for that haha, don't really know much haskell though

AvasDream · 2022-07-19T15:00:14Z

AvasDream
Jul 19, 2022

I am also using PostgREST on K8s and I don't have any problems. However, one big difference is I am using a Postgres Operator -> https://www.kubegres.io/ and therefore a Postgres Cluster is used. Some features like backup restore rolling updates, node migration resistency from Kubegres is something you will most likely have to solve in the future anyway. Hope this helps :)

2 replies

wolfgangwalther Jul 21, 2022
Maintainer

I am also using PostgREST on K8s and I don't have any problems.

Same here.

Postgres Operator -> https://www.kubegres.io/

Thanks for the hint about kubegres. I have been looking for a PG operator that would allow me to bring my own docker image for a while, so this seems great.

AvasDream Jul 23, 2022

I've had a look at multiple operators for pg, and Kubegres is simple to use and has the most necessary features. But other operators are a bit more complicated but got more features :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusually high latency on Kubernetes cluster #2315

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Unusually high latency on Kubernetes cluster #2315

taufik-rama Jun 13, 2022

Environment

Description of issue

Replies: 3 comments · 7 replies

steve-chavez Jun 13, 2022 Maintainer

taufik-rama Jun 14, 2022 Author

steve-chavez Jun 14, 2022 Maintainer

taufik-rama Jun 14, 2022 Author

steve-chavez Jun 14, 2022 Maintainer

taufik-rama Jun 14, 2022 Author

taufik-rama Jun 14, 2022 Author

AvasDream Jul 19, 2022

wolfgangwalther Jul 21, 2022 Maintainer

AvasDream Jul 23, 2022

taufik-rama
Jun 13, 2022

Replies: 3 comments 7 replies

steve-chavez
Jun 13, 2022
Maintainer

taufik-rama
Jun 14, 2022
Author

steve-chavez Jun 14, 2022
Maintainer

taufik-rama Jun 14, 2022
Author

steve-chavez Jun 14, 2022
Maintainer

taufik-rama Jun 14, 2022
Author

taufik-rama Jun 14, 2022
Author

AvasDream
Jul 19, 2022

wolfgangwalther Jul 21, 2022
Maintainer