Upgrade hasql-pool to forked 0.7.2 #2444

robx · 2022-08-24T07:42:42Z

This is a take on #2391 that uses our fork of hasql-pool with flushing support.

robx · 2022-08-24T16:45:00Z

I would like #2445 to be merged first, because I expect that there's something to be worked out between the plain cabal build and the nix build. (I.e., I'll want to add a cabal.project file pointing at the fork, without that interfering with the nix build which also uses cabal.)

And then this PR should wait for PostgREST/hasql-pool#2 to be merged, to ensure (a) that change is reviewed and (b) keep things somewhat clean, by only ever depending on main branch commits of the forked repo.

robx · 2022-08-26T11:07:08Z

This should be good to go now.

I was wondering whether I should write up some guidelines on dealing with forked dependencies such as this one (where? a new DEVELOPMENT.md?), but maybe let's see how this plays out first and follow up then?

steve-chavez · 2022-08-26T17:04:21Z

I was wondering whether I should write up some guidelines on dealing with forked dependencies such as this one (where? a new DEVELOPMENT.md?), but maybe let's see how this plays out first and follow up then?

Yeah, I see new activity on hasql-pool so let's see if we still maintain our forked dependencies for a longer time.

The original test no longer makes sense once we drop pool timeouts with the hasql-pool upgrade. To somehow test that new connections have the settings, convert it to flush the pool instead.

…ehaviour Also fix documentation of AppState pool field.

robx · 2022-08-29T12:01:56Z

Chances are we'll merge nikita-volkov/hasql-pool#16 soon enough with a slightly simpler interface (PoolIsReleasedUsageError goes away, in particular.) But probably it's best to just go ahead with this and update/simplify later, what do you think?

This version of hasql-pool is a simplified rewrite that doesn't use the resource-pool package. The major API changes are that idle connections are no longer timed out (and the corresponding setting is gone), and that `release` makes the pool unusable, where it used to remain usable and only flushed idle connections. We depend on a PostgREST fork of 0.7.2 that gives us reliable flushing, compare PostgREST/hasql-pool#1 - hasql-pool 0.7 removes timing out of idle connections, so this change removes the db-pool-timeout option. Given that we were typically running with very high timeout settings, I don't anticipate the lack of timeout to introduce new issues, though we might want to consider introducing some retry-logic down the line when we encounter connection failures. - See PostgREST#2422 for a discussion on depending on a forked dependency. Besides adding the dependency to the nix overlay, we're also adding it to stack.yaml and a new cabal.project to allow stack/cabal users to build the project.

steve-chavez · 2022-08-29T12:30:42Z

Chances are we'll merge nikita-volkov/hasql-pool#16 soon enough with a slightly simpler interface (PoolIsReleasedUsageError goes away, in particular.)

Nice 🎉

But probably it's best to just go ahead with this and update/simplify later, what do you think?

Agree.

bhupixb · 2022-11-02T16:03:28Z

Is this ideal to remove pool timeout?
We have a sudden spike in traffic at different times of the day in most of our workloads, so we have configured the DB Pool size to a higher value(100) so that by the time HPA scales up the pods, the existing pod have enough connections available to serve requests.
After the traffic slows down, we want the ideal connections to be closed after some time.

But if we remove this flag, then we will be wasting a lot of resources on the Postgres server per connection unnecessarily when there's minimal/no traffic. Does the upgraded hasql-pool automatically closes ideal connections?

wolfgangwalther · 2022-11-02T16:14:43Z

Is this ideal to remove pool timeout?

Certainly not. It was removed upstream, though. So now, we would have to re-implement it ourselves. Not sure whether that's feasible. @robx @steve-chavez WDYT?

steve-chavez · 2022-11-02T16:23:17Z

But if we remove this flag, then we will be wasting a lot of resources on the Postgres server per connection unnecessarily when there's minimal/no traffic

Hm, yeah. Under this scenario the connection timeout seems ideal.

Not sure how hard is to re-implement that in postgrest or hasql though.

As a workaround, perhaps setting idle_session_timeout could work?

wolfgangwalther · 2022-11-02T16:26:03Z

As a workaround, perhaps setting idle_session_timeout could work?

If that works, that's much better than just a workaround. The question is, however, how hasql-pool deals with that connection loss. Will it reconnect immediately? Or would it only reconnect if the connection is required?

steve-chavez · 2022-11-03T01:07:11Z

@bhupixb Could you test the pool behavior with idle_session_timeout as mentioned above by Wolfgang?

robx · 2022-11-03T10:33:24Z

Some quick random ideas on this:

you could actively flush the pool every couple of minutes using SIGUSR1, e.g. via cron
if we wanted to implement this in postgrest without touching hasql-pool, we might do the same, i.e. periodically release the pool

But if idle_session_timeout works, that's probably preferable, because it also covers scenarios such as the postgrest server being disconnected from the postgresql server.

bhupixb · 2022-11-04T05:12:19Z

@bhupixb Could you test the pool behavior with idle_session_timeout as mentioned above by Wolfgang?

This will work IMO(haven't tested yet), but an RDS can have multiple applications running their DB on it, so setting the idle_session_timeout RDS level is a bit less flexible in our case.
Because not all applications want to clean up idle connections as frequently as we do(because we know in Postgrest we can clean up).

you could actively flush the pool every couple of minutes using SIGUSR1, e.g. via cron

@robx Can you please elaborate a little here, say my postgrest container is running and I added a sidecar to close idle connections. So what command should I send to postgrest for doing so?

wolfgangwalther · 2022-11-04T06:39:31Z

This will work IMO(haven't tested yet),

You can't really tell without trying. We'd expect the idle connections to be closed for sure - but what we don't know is whether PostgREST will restore all those idle connections immediately, or whether it will wait reconnecting until it needs them.

but an RDS can have multiple applications running their DB on it, so setting the idle_session_timeout RDS level is a bit less flexible in our case.

Just set it on the authenticator role only, to make it work for PostgREST only:

ALTER ROLE authenticator SET idle_session_timeout = '...';

robx · 2022-11-04T09:56:01Z

@robx Can you please elaborate a little here, say my postgrest container is running and I added a sidecar to close idle connections. So what command should I send to postgrest for doing so?

Here: https://postgrest.org/en/stable/schema_cache.html#schema-cache-reloading. A side-effect of schema-cache reloading is that connections are released.

steve-chavez · 2022-11-04T22:12:00Z

We'd expect the idle connections to be closed for sure - but what we don't know is whether PostgREST will restore all those idle connections immediately, or whether it will wait reconnecting until it needs them.

Just tried that, postgrest recovers but it gets noisy on the logs and the LISTEN channel gets killed every time.

PGRST_DB_ANON_ROLE='postgrest_test_anonymous' postgrest-with-postgresql-14   postgrest-run

04/Nov/2022:17:06:33 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:33 -0500: Connection successful
04/Nov/2022:17:06:33 -0500: Listening on port 3000
04/Nov/2022:17:06:33 -0500: Config reloaded
04/Nov/2022:17:06:33 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:33 -0500: Schema cache loaded
FATAL:  terminating connection due to idle-session timeout
04/Nov/2022:17:06:38 -0500: Retrying listening for notifications on the pgrst channel..
04/Nov/2022:17:06:38 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:38 -0500: Connection successful
04/Nov/2022:17:06:38 -0500: Config reloaded
04/Nov/2022:17:06:38 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:38 -0500: Schema cache loaded
FATAL:  terminating connection due to idle-session timeout
04/Nov/2022:17:06:43 -0500: Retrying listening for notifications on the pgrst channel..
04/Nov/2022:17:06:43 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:43 -0500: Connection successful
04/Nov/2022:17:06:43 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:43 -0500: Config reloaded
04/Nov/2022:17:06:43 -0500: Schema cache loaded

If the LISTEN channel is disabled(PGRST_DB_CHANNEL_ENABLED=false), and requests come frequently enough, no timeout is triggered as expected.

I guess the above is not a problem if you not set the idle_session_timeout to a small value. If it's like 1 hour(this is the v10 default for db-pool-timeout) it should be fine.

wolfgangwalther · 2022-11-05T08:33:15Z

I guess the above is not a problem if you not set the idle_session_timeout to a small value. If it's like 1 hour(this is the v10 default for db-pool-timeout) it should be fine.

I don't think setting this is good at all right now. I made a few tests with PGRST_DB_CHANNEL=false and PGRST_DB_POOL=10 and it seems like the connections are closed without error messages in the log - fine. But the pool doesn't recognize the connection failure - not before the next request comes in. That means after the connections are closed from the server-side, the pool still thinks it has multiple connections open - and the first few requests fail like this:

{"code":"57P05","details":null,"hint":null,"message":"terminating connection due to idle-session timeout"}

and later

{"code":"PGRST001","details":"no connection to the server\n","hint":null,"message":"Database client error. Retrying the connection."}

Only when the reconnection was successful, postgrest is responding regularly again.

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

Seems to be related to #1932, too?

robx · 2022-11-05T09:05:59Z

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

I tend to agree. It seems reasonable (and mostly unavoidable -- we can't know if an old connection is still good) to automatically retry for certain error classes.

It might make sense to do this in hasql-pool directly, not sure there.

wolfgangwalther · 2022-11-05T12:39:47Z

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

I tend to agree. It seems reasonable (and mostly unavoidable -- we can't know if an old connection is still good) to automatically retry for certain error classes.

It might make sense to do this in hasql-pool directly, not sure there.

I think what should happen is:

Connections that are found dead should return their resources to the pool (not sure about the right terminoloy here). Right now it seems like hasql-pool still thinks those are open and makes them available to the pool user? Until it runs a query and then realizes the connection loss.
Whenever a new connection is made, because there is no idle connection in the pool, the request should wait for that connection to happen. I'm pretty sure this is the case already, because when I start with 1 connection and run concurrent requests.. the number of connections increases without any of those error messages.

we can't know if an old connection is still good

Why is that? When the connection is terminated via idle_session_timeout I would expect the client to realize this immediately, no?

robx · 2022-11-05T13:00:09Z

* Connections that are found dead should return their resources to the pool (not sure about the right terminoloy here). Right now it seems like hasql-pool still thinks those are open and makes them available to the pool user? Until it runs a query and then realizes the connection loss.

You won't be able to tell that the connection is closed without writing to or reading from the socket, which needs to happen actively. It would be possible to preemptively try to read from the connection to see if the server terminated the connection abnormally (compare https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.11), but that's quite complicated with little fundamental benefit: Even if we have some reaper thread waiting for connections to terminate in order to remove them from the pool, there's a window where the server terminates the connection just after we took it from the pool, so for correct behaviour, we'd have to implement the retry anyway.

The straightforward way (what feels like the "right" way to me), is to expect certain errors, such as a dead connection, and retry silently.

* Whenever a new connection is made, because there is no idle connection in the pool, the request should wait for that connection to happen. I'm pretty sure this is the case already, because when I start with 1 connection and run concurrent requests.. the number of connections increases without any of those error messages.

Yes, I think that's correct.

we can't know if an old connection is still good

Why is that? When the connection is terminated via idle_session_timeout I would expect the client to realize this immediately, no?

As noted above, only when reading from the socket. And then this still only covers certain active shutdown scenarios. It can also happen that network disruption causes a TCP socket to be in a state where we only see that the connection is broken once we write to it.

So in general, no, the client doesn't realize this immediately. If it does something like select() on the socket, it could realize immediately when a shutdown message is received, but that would still require some thread actively waiting for such notifications.

(There's a number of connection settings that can play into this, e.g. around TCP keepalives, see https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS.)

robx force-pushed the upgrade-pool-fork branch 3 times, most recently from da55577 to cebef2f Compare August 24, 2022 14:48

robx mentioned this pull request Aug 24, 2022

Add a CI check for building with Cabal #2445

Merged

robx force-pushed the upgrade-pool-fork branch 2 times, most recently from 664f9f7 to 928fb1f Compare August 26, 2022 10:49

robx marked this pull request as ready for review August 26, 2022 11:04

robx mentioned this pull request Aug 26, 2022

Add db-pool-acquisition-timeout #2449

Merged

steve-chavez approved these changes Aug 26, 2022

View reviewed changes

robx added 5 commits August 29, 2022 13:59

tests: validate that requests survive SIGUSR1

d07d437

tests: update app settings test

c54a84c

The original test no longer makes sense once we drop pool timeouts with the hasql-pool upgrade. To somehow test that new connections have the settings, convert it to flush the pool instead.

ci: install git for cirrus to allow git repo dependencies

5c262a2

refactor: rename releasePool to flushPool and document its intended b…

cdb9caf

…ehaviour Also fix documentation of AppState pool field.

refactor: more accurate naming and documentation for connectionStatus

96ab2cc

robx force-pushed the upgrade-pool-fork branch from 660d891 to 1829614 Compare August 29, 2022 12:00

robx force-pushed the upgrade-pool-fork branch from 1829614 to bf80771 Compare August 29, 2022 12:08

robx merged commit 90eaaef into PostgREST:main Aug 29, 2022

robx deleted the upgrade-pool-fork branch August 29, 2022 12:55

steve-chavez mentioned this pull request Oct 27, 2022

Computed relationships don't support Nested Embedding #2530

Closed

wolfgangwalther mentioned this pull request Nov 4, 2022

fatal: range without lower bound #2558

Closed

wolfgangwalther mentioned this pull request Nov 5, 2022

fix: bump hasql-pool to 0.8.0.6 #2562

Merged

steve-chavez mentioned this pull request Jan 27, 2023

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade hasql-pool to forked 0.7.2 #2444

Upgrade hasql-pool to forked 0.7.2 #2444

robx commented Aug 24, 2022

robx commented Aug 24, 2022 •

edited

Loading

robx commented Aug 26, 2022

steve-chavez commented Aug 26, 2022

robx commented Aug 29, 2022

steve-chavez commented Aug 29, 2022

bhupixb commented Nov 2, 2022 •

edited

Loading

wolfgangwalther commented Nov 2, 2022

steve-chavez commented Nov 2, 2022

wolfgangwalther commented Nov 2, 2022

steve-chavez commented Nov 3, 2022

robx commented Nov 3, 2022

bhupixb commented Nov 4, 2022 •

edited

Loading

wolfgangwalther commented Nov 4, 2022

robx commented Nov 4, 2022

steve-chavez commented Nov 4, 2022 •

edited

Loading

wolfgangwalther commented Nov 5, 2022 •

edited

Loading

robx commented Nov 5, 2022

wolfgangwalther commented Nov 5, 2022

robx commented Nov 5, 2022

Upgrade hasql-pool to forked 0.7.2 #2444

Upgrade hasql-pool to forked 0.7.2 #2444

Conversation

robx commented Aug 24, 2022

robx commented Aug 24, 2022 • edited Loading

robx commented Aug 26, 2022

steve-chavez commented Aug 26, 2022

robx commented Aug 29, 2022

steve-chavez commented Aug 29, 2022

bhupixb commented Nov 2, 2022 • edited Loading

wolfgangwalther commented Nov 2, 2022

steve-chavez commented Nov 2, 2022

wolfgangwalther commented Nov 2, 2022

steve-chavez commented Nov 3, 2022

robx commented Nov 3, 2022

bhupixb commented Nov 4, 2022 • edited Loading

wolfgangwalther commented Nov 4, 2022

robx commented Nov 4, 2022

steve-chavez commented Nov 4, 2022 • edited Loading

wolfgangwalther commented Nov 5, 2022 • edited Loading

robx commented Nov 5, 2022

wolfgangwalther commented Nov 5, 2022

robx commented Nov 5, 2022

robx commented Aug 24, 2022 •

edited

Loading

bhupixb commented Nov 2, 2022 •

edited

Loading

bhupixb commented Nov 4, 2022 •

edited

Loading

steve-chavez commented Nov 4, 2022 •

edited

Loading

wolfgangwalther commented Nov 5, 2022 •

edited

Loading