Skip to content

Conversation

aerosol
Copy link
Member

@aerosol aerosol commented Aug 18, 2025

This hopefully prevents the situation in which an unusable (e.g. closed) connection is returned to the pool, causing the next client in queue fail to execute its query.

The expectation is, by propagating the DBConnection.ConnectionError we'll make it caught by DBConnection and the broken connection will be removed from the pool.

@ruslandoga curious on your thoughts on this

This hopefully prevents the situation in which
an unusable (e.g. closed) connection is returned to the pool,
causing the next client in queue fail to execute its query.

The expectation is, by propagating the `DBConnection.ConnectionError`
we'll make it caught by DBConnection and the broken connection will
be removed from the pool.
@ruslandoga
Copy link
Contributor

ruslandoga commented Aug 18, 2025

👋

I will need to double check but I think the problem is still there.

Previously:

  • client A gets a connection and sends a request
  • ClickHouse responds with connection: close
  • Mint marks conn as closed and closes the connection
  • the closed connection is returned to the pool
  • client B gets a closed Mint conn
  • client B attempts to reconnect
  • client B fails to reconnect and sends a request over the "old" conn (closed by A)
  • Mint returns :closed error
  • client B request fails
  • the closed connection is not returned to the pool

Now:

  • client A gets a connection and sends a request
  • ClickHouse responds with connection: close
  • Mint marks conn as closed and closes the connection
  • the closed connection is returned to the pool
  • client B gets a closed Mint conn
  • client B attempts to reconnect
  • client B fails to reconnect and we raise a new exception
  • client B request fails
  • the failed connection is not returned to the pool

AFAIK with DBConnection it's not possible to return "ok but then disconnect" so I don't know of a way to stop client A from returning a connection closed after a successful query. If we check if HTTP.open?(conn) after a response, and then return disconnect then the flow would still have an error:

  • client A gets a connection and sends a request
  • ClickHouse responds with connection: close
  • Mint marks conn as closed and closes the connection
  • we return {:disconnect, error, conn}
  • client A fails (even though the response might have been OK)
  • the failed connection is not returned to the pool

Maybe we could move retries to Ch.query? Similar to how Req does it. We can retry transient errors (including lost connection) up to max_retries with backoff and remove the current reconnection-only retries logic from DBConnection. The downside is that we could get unlucky and spend all attempts on closed connections in the pool (e.g. poolsize=10, max_retries=3, when all connections got closed).

Or maybe DBConnection could be extended with "ok but then disconnect" or revisit elixir-ecto/db_connection#316 or we can replace DBConnection with a custom gen_statem connection implementation, similar to how Xandra did it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants