Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade hasql-pool to forked 0.7.2 #2444

Merged
merged 6 commits into from
Aug 29, 2022
Merged

Conversation

robx
Copy link
Contributor

@robx robx commented Aug 24, 2022

This is a take on #2391 that uses our fork of hasql-pool with flushing support.

@robx robx force-pushed the upgrade-pool-fork branch 3 times, most recently from da55577 to cebef2f Compare August 24, 2022 14:48
@robx
Copy link
Contributor Author

robx commented Aug 24, 2022

I would like #2445 to be merged first, because I expect that there's something to be worked out between the plain cabal build and the nix build. (I.e., I'll want to add a cabal.project file pointing at the fork, without that interfering with the nix build which also uses cabal.)

And then this PR should wait for PostgREST/hasql-pool#2 to be merged, to ensure (a) that change is reviewed and (b) keep things somewhat clean, by only ever depending on main branch commits of the forked repo.

@robx robx force-pushed the upgrade-pool-fork branch 2 times, most recently from 664f9f7 to 928fb1f Compare August 26, 2022 10:49
@robx robx marked this pull request as ready for review August 26, 2022 11:04
@robx
Copy link
Contributor Author

robx commented Aug 26, 2022

This should be good to go now.

I was wondering whether I should write up some guidelines on dealing with forked dependencies such as this one (where? a new DEVELOPMENT.md?), but maybe let's see how this plays out first and follow up then?

@steve-chavez
Copy link
Member

I was wondering whether I should write up some guidelines on dealing with forked dependencies such as this one (where? a new DEVELOPMENT.md?), but maybe let's see how this plays out first and follow up then?

Yeah, I see new activity on hasql-pool so let's see if we still maintain our forked dependencies for a longer time.

The original test no longer makes sense once we drop pool timeouts
with the hasql-pool upgrade.

To somehow test that new connections have the settings, convert it
to flush the pool instead.
…ehaviour

Also fix documentation of AppState pool field.
@robx
Copy link
Contributor Author

robx commented Aug 29, 2022

Chances are we'll merge nikita-volkov/hasql-pool#16 soon enough with a slightly simpler interface (PoolIsReleasedUsageError goes away, in particular.) But probably it's best to just go ahead with this and update/simplify later, what do you think?

This version of hasql-pool is a simplified rewrite that doesn't use
the resource-pool package. The major API changes are that idle
connections are no longer timed out (and the corresponding setting
is gone), and that `release` makes the pool unusable, where it used
to remain usable and only flushed idle connections.

We depend on a PostgREST fork of 0.7.2 that gives us reliable
flushing, compare PostgREST/hasql-pool#1

- hasql-pool 0.7 removes timing out of idle connections, so
  this change removes the db-pool-timeout option.
  Given that we were typically running with very high
  timeout settings, I don't anticipate the lack of timeout
  to introduce new issues, though we might want to consider
  introducing some retry-logic down the line when we
  encounter connection failures.
- See PostgREST#2422 for a
  discussion on depending on a forked dependency. Besides adding
  the dependency to the nix overlay, we're also adding it to
  stack.yaml and a new cabal.project to allow stack/cabal users
  to build the project.
@steve-chavez
Copy link
Member

Chances are we'll merge nikita-volkov/hasql-pool#16 soon enough with a slightly simpler interface (PoolIsReleasedUsageError goes away, in particular.)

Nice 🎉

But probably it's best to just go ahead with this and update/simplify later, what do you think?

Agree.

@robx robx merged commit 90eaaef into PostgREST:main Aug 29, 2022
@robx robx deleted the upgrade-pool-fork branch August 29, 2022 12:55
@bhupixb
Copy link

bhupixb commented Nov 2, 2022

Is this ideal to remove pool timeout?
We have a sudden spike in traffic at different times of the day in most of our workloads, so we have configured the DB Pool size to a higher value(100) so that by the time HPA scales up the pods, the existing pod have enough connections available to serve requests.
After the traffic slows down, we want the ideal connections to be closed after some time.

But if we remove this flag, then we will be wasting a lot of resources on the Postgres server per connection unnecessarily when there's minimal/no traffic. Does the upgraded hasql-pool automatically closes ideal connections?

@wolfgangwalther
Copy link
Member

Is this ideal to remove pool timeout?

Certainly not. It was removed upstream, though. So now, we would have to re-implement it ourselves. Not sure whether that's feasible. @robx @steve-chavez WDYT?

@steve-chavez
Copy link
Member

But if we remove this flag, then we will be wasting a lot of resources on the Postgres server per connection unnecessarily when there's minimal/no traffic

Hm, yeah. Under this scenario the connection timeout seems ideal.

Not sure how hard is to re-implement that in postgrest or hasql though.

As a workaround, perhaps setting idle_session_timeout could work?

@wolfgangwalther
Copy link
Member

As a workaround, perhaps setting idle_session_timeout could work?

If that works, that's much better than just a workaround. The question is, however, how hasql-pool deals with that connection loss. Will it reconnect immediately? Or would it only reconnect if the connection is required?

@steve-chavez
Copy link
Member

@bhupixb Could you test the pool behavior with idle_session_timeout as mentioned above by Wolfgang?

@robx
Copy link
Contributor Author

robx commented Nov 3, 2022

Some quick random ideas on this:

  • you could actively flush the pool every couple of minutes using SIGUSR1, e.g. via cron
  • if we wanted to implement this in postgrest without touching hasql-pool, we might do the same, i.e. periodically release the pool

But if idle_session_timeout works, that's probably preferable, because it also covers scenarios such as the postgrest server being disconnected from the postgresql server.

@bhupixb
Copy link

bhupixb commented Nov 4, 2022

@bhupixb Could you test the pool behavior with idle_session_timeout as mentioned above by Wolfgang?

This will work IMO(haven't tested yet), but an RDS can have multiple applications running their DB on it, so setting the idle_session_timeout RDS level is a bit less flexible in our case.
Because not all applications want to clean up idle connections as frequently as we do(because we know in Postgrest we can clean up).

  • you could actively flush the pool every couple of minutes using SIGUSR1, e.g. via cron

@robx Can you please elaborate a little here, say my postgrest container is running and I added a sidecar to close idle connections. So what command should I send to postgrest for doing so?

@wolfgangwalther
Copy link
Member

This will work IMO(haven't tested yet),

You can't really tell without trying. We'd expect the idle connections to be closed for sure - but what we don't know is whether PostgREST will restore all those idle connections immediately, or whether it will wait reconnecting until it needs them.

but an RDS can have multiple applications running their DB on it, so setting the idle_session_timeout RDS level is a bit less flexible in our case.

Just set it on the authenticator role only, to make it work for PostgREST only:

ALTER ROLE authenticator SET idle_session_timeout = '...';

@robx
Copy link
Contributor Author

robx commented Nov 4, 2022

@robx Can you please elaborate a little here, say my postgrest container is running and I added a sidecar to close idle connections. So what command should I send to postgrest for doing so?

Here: https://postgrest.org/en/stable/schema_cache.html#schema-cache-reloading. A side-effect of schema-cache reloading is that connections are released.

@steve-chavez
Copy link
Member

steve-chavez commented Nov 4, 2022

We'd expect the idle connections to be closed for sure - but what we don't know is whether PostgREST will restore all those idle connections immediately, or whether it will wait reconnecting until it needs them.

Just tried that, postgrest recovers but it gets noisy on the logs and the LISTEN channel gets killed every time.

PGRST_DB_ANON_ROLE='postgrest_test_anonymous' postgrest-with-postgresql-14   postgrest-run

04/Nov/2022:17:06:33 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:33 -0500: Connection successful
04/Nov/2022:17:06:33 -0500: Listening on port 3000
04/Nov/2022:17:06:33 -0500: Config reloaded
04/Nov/2022:17:06:33 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:33 -0500: Schema cache loaded
FATAL:  terminating connection due to idle-session timeout
04/Nov/2022:17:06:38 -0500: Retrying listening for notifications on the pgrst channel..
04/Nov/2022:17:06:38 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:38 -0500: Connection successful
04/Nov/2022:17:06:38 -0500: Config reloaded
04/Nov/2022:17:06:38 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:38 -0500: Schema cache loaded
FATAL:  terminating connection due to idle-session timeout
04/Nov/2022:17:06:43 -0500: Retrying listening for notifications on the pgrst channel..
04/Nov/2022:17:06:43 -0500: Attempting to connect to the database...
04/Nov/2022:17:06:43 -0500: Connection successful
04/Nov/2022:17:06:43 -0500: Listening for notifications on the pgrst channel
04/Nov/2022:17:06:43 -0500: Config reloaded
04/Nov/2022:17:06:43 -0500: Schema cache loaded

If the LISTEN channel is disabled(PGRST_DB_CHANNEL_ENABLED=false), and requests come frequently enough, no timeout is triggered as expected.


I guess the above is not a problem if you not set the idle_session_timeout to a small value. If it's like 1 hour(this is the v10 default for db-pool-timeout) it should be fine.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Nov 5, 2022

I guess the above is not a problem if you not set the idle_session_timeout to a small value. If it's like 1 hour(this is the v10 default for db-pool-timeout) it should be fine.

I don't think setting this is good at all right now. I made a few tests with PGRST_DB_CHANNEL=false and PGRST_DB_POOL=10 and it seems like the connections are closed without error messages in the log - fine. But the pool doesn't recognize the connection failure - not before the next request comes in. That means after the connections are closed from the server-side, the pool still thinks it has multiple connections open - and the first few requests fail like this:

{"code":"57P05","details":null,"hint":null,"message":"terminating connection due to idle-session timeout"}

and later

{"code":"PGRST001","details":"no connection to the server\n","hint":null,"message":"Database client error. Retrying the connection."}

Only when the reconnection was successful, postgrest is responding regularly again.

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

Seems to be related to #1932, too?

@robx
Copy link
Contributor Author

robx commented Nov 5, 2022

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

I tend to agree. It seems reasonable (and mostly unavoidable -- we can't know if an old connection is still good) to automatically retry for certain error classes.

It might make sense to do this in hasql-pool directly, not sure there.

@wolfgangwalther
Copy link
Member

Surely, in that case we'd want PostgREST to just delay the request until the connection is up again?

I tend to agree. It seems reasonable (and mostly unavoidable -- we can't know if an old connection is still good) to automatically retry for certain error classes.

It might make sense to do this in hasql-pool directly, not sure there.

I think what should happen is:

  • Connections that are found dead should return their resources to the pool (not sure about the right terminoloy here). Right now it seems like hasql-pool still thinks those are open and makes them available to the pool user? Until it runs a query and then realizes the connection loss.
  • Whenever a new connection is made, because there is no idle connection in the pool, the request should wait for that connection to happen. I'm pretty sure this is the case already, because when I start with 1 connection and run concurrent requests.. the number of connections increases without any of those error messages.

we can't know if an old connection is still good

Why is that? When the connection is terminated via idle_session_timeout I would expect the client to realize this immediately, no?

@robx
Copy link
Contributor Author

robx commented Nov 5, 2022

* Connections that are found dead should return their resources to the pool (not sure about the right terminoloy here). Right now it seems like hasql-pool still thinks those are open and makes them available to the pool user? Until it runs a query and then realizes the connection loss.

You won't be able to tell that the connection is closed without writing to or reading from the socket, which needs to happen actively. It would be possible to preemptively try to read from the connection to see if the server terminated the connection abnormally (compare https://www.postgresql.org/docs/current/protocol-flow.html#id-1.10.6.7.11), but that's quite complicated with little fundamental benefit: Even if we have some reaper thread waiting for connections to terminate in order to remove them from the pool, there's a window where the server terminates the connection just after we took it from the pool, so for correct behaviour, we'd have to implement the retry anyway.

The straightforward way (what feels like the "right" way to me), is to expect certain errors, such as a dead connection, and retry silently.

* Whenever a new connection is made, because there is no idle connection in the pool, the request should wait for that connection to happen. I'm pretty sure this is the case already, because when I start with 1 connection and run concurrent requests.. the number of connections increases without any of those error messages.

Yes, I think that's correct.

we can't know if an old connection is still good

Why is that? When the connection is terminated via idle_session_timeout I would expect the client to realize this immediately, no?

As noted above, only when reading from the socket. And then this still only covers certain active shutdown scenarios. It can also happen that network disruption causes a TCP socket to be in a state where we only see that the connection is broken once we write to it.

So in general, no, the client doesn't realize this immediately. If it does something like select() on the socket, it could realize immediately when a shutdown message is received, but that would still require some thread actively waiting for such notifications.

(There's a number of connection settings that can play into this, e.g. around TCP keepalives, see https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants