Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory #2638

Closed
steve-chavez opened this issue Jan 27, 2023 · 9 comments

Comments

@steve-chavez
Copy link
Member

steve-chavez commented Jan 27, 2023

Problem

Idle connections can take up memory and not release it, here's an example htop output showing this:

idle_conns

(authenticator is the connection role used by us, this follows the docs)

This gets serious whenever the db-pool size is big, the connections can consume all memory.

More details on why a connection takes up memory in this SO question. Basically it has a cache of the catalog and this gets bigger as more database objects are added.

Now, this wasn't an issue until v10.0.0, because we had a db-pool-timeout which timed out idle connections. But on v10.1.0 this was removed(#2444) due to the upstream library not supporting it.

Solution

We should bring back the timeout or alternatively seems we can use the DISCARD ALL statement whenever a connection is returned back to the pool(not sure of the drawbacks) according to this AWS blog post.

cc @robx

@steve-chavez
Copy link
Member Author

v10.0.0 might also be affected by this because the timeout was increased to 1 hour.

@steve-chavez steve-chavez changed the title Not having a pool timeout causes a memory leak v10.1: Not having a pool timeout causes a memory leak Jan 29, 2023
@robx
Copy link
Contributor

robx commented Jan 31, 2023

There's been quite a bit of discussion of this following this comment: #2444 (comment). It would be useful to summarize this on this ticket. E.g., I'm guessing that setting the session idle timeout wouldn't be sufficient because the connections might all stay busy?

(I find the title of the issue a bit misleading because at first read, it seems like the postgrest process is leaking memory, while instead the postgresql server processes are growing caches indefinitely.)

  • As noted in the linked discussion, periodically releasing the pool could be an option, at least as a work-around (postgrest could do this in-process on a timer)
  • It may be instructive to investigate how mature postgres client libraries in other languages handle such issues. E.g. Go's pgx has configuration parameters MaxConnLifetime, MaxConnLifetimeJitter, MaxConnIdleTime.
  • It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

@steve-chavez steve-chavez changed the title v10.1: Not having a pool timeout causes a memory leak v10.1: Not having a pool timeout causes PostgreSQL connections to leak memory Jan 31, 2023
@steve-chavez
Copy link
Member Author

(I find the title of the issue a bit misleading because at first read, it seems like the postgrest process is leaking memory, while instead the postgresql server processes are growing caches indefinitely.)

True, I've just corrected the title.

It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

Hmm, wouldn't it be possible to restore the old timeout on hasql-pool?

@steve-chavez
Copy link
Member Author

steve-chavez commented Jan 31, 2023

Note: This issue was made worse by #2620, because users schemas kept changing and increasing the cache size while connections were never restarted. So having #2639 will at least placate this issue.

At this point v10.1.1 is not stable at all, will release a v10.1.2 and when this is done I'll release a v10.1.3 which should finally be stable.

@steve-chavez steve-chavez changed the title v10.1: Not having a pool timeout causes PostgreSQL connections to leak memory v10.1.1: Not having a pool timeout causes PostgreSQL connections to leak memory Jan 31, 2023
@robx
Copy link
Contributor

robx commented Feb 3, 2023

It might be a good idea to raise the issue with hasql-pool -- I'm not sure that a clean solution can be built on top of that library (e.g. I'm not sure it's possible to track connection age and kill the connection from our side)

Hmm, wouldn't it be possible to restore the old timeout on hasql-pool?

Sorry, I meant that it seems hard to introduce a timeout without modifying hasql-pool.

("on top of" in the sense of working with the existing hasql-pool. E.g. you could imagine that when our client code receives a connection from the pool, it checks its age and throws an exception if it's too old, which would cause hasql-pool to discard the connection.)

So yeah I agree that the best thing to do would be to fix this in hasql-pool, by reintroducing the timeout.

@robx
Copy link
Contributor

robx commented Feb 3, 2023

I created nikita-volkov/hasql-pool#28 as a proof of concept fix for this. Depending on how urgent this is for postgrest, I could prepare a postgrest PR that pulls in that development version.

@steve-chavez
Copy link
Member Author

@robx Nice work!

Depending on how urgent this is for postgrest, I could prepare a postgrest PR that pulls in that development version.

I think we can wait until the maxIdleTime(nikita-volkov/hasql-pool#29) is added.

For the issue at hand it seems a maxLifetime is more appropriate than a maxIdleTime, since if we keep the connections busy an idle timeout would never trigger.

Hm, this is true but in production I didn't saw a similar issue before when we had the db-pool-timeout, so maybe maxIdleTime is enough.

Could maxLifetime reduce throughput when the max is reached? I mean since new requests would have to wait for connections to be made.

@steve-chavez
Copy link
Member Author

Note: this blog post explains the memory drawback of prepared statements(search for "Reduce the memory usage of prepared queries"), which are cached per connection in PostgreSQL.

@steve-chavez
Copy link
Member Author

Just to make this more visible. HikariCP has a maxLifetime config with a default of 30 mins, the pg hackers are very positive about this default as shown on this discussion.

robx added a commit to robx/postgrest that referenced this issue Apr 6, 2023
)

- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
robx added a commit to robx/postgrest that referenced this issue Apr 6, 2023
)

- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
robx added a commit to robx/postgrest that referenced this issue Apr 6, 2023
)

- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
robx added a commit to robx/postgrest that referenced this issue Apr 6, 2023
)

- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
@robx robx closed this as completed in 394bd22 Apr 6, 2023
laurenceisla pushed a commit to laurenceisla/postgrest that referenced this issue Apr 11, 2023
)

- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
laurenceisla pushed a commit that referenced this issue Apr 12, 2023
- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
steve-chavez pushed a commit that referenced this issue Apr 12, 2023
- db-pool-acquisition-timeout is no longer optional, defaults to 10s
- new option db-pool-max-lifetime limits the maximal lifetime of a
  postgresql connection, defaults to 30m
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants