Skip to content

fix(gateway): stop Mattermost/Matrix reconnect loop on permanent auth failures#3695

Open
binhnt92 wants to merge 1 commit intoNousResearch:mainfrom
binhnt92:fix/mattermost-matrix-auth-retry-hardening
Open

fix(gateway): stop Mattermost/Matrix reconnect loop on permanent auth failures#3695
binhnt92 wants to merge 1 commit intoNousResearch:mainfrom
binhnt92:fix/mattermost-matrix-auth-retry-hardening

Conversation

@binhnt92
Copy link
Copy Markdown
Contributor

Mattermost's _ws_loop and Matrix's _sync_loop both catch all exceptions with a broad except Exception and retry with backoff forever. When the auth token is invalid or revoked (HTTP 401/403, M_UNKNOWN_TOKEN), these loops spin indefinitely — burning resources and flooding logs with retry warnings that will never succeed.

This is the same pattern that PR #3390 fixed for Telegram's message_thread_id misclassification: distinguishing permanent errors from transient ones instead of retrying everything.

Changes Made

Mattermost (gateway/platforms/mattermost.py):

  • Detect aiohttp.WSServerHandshakeError with status 401/403
  • Detect string-based auth indicators (unauthorized, 401, 403)
  • Stop reconnect loop immediately on permanent auth failure

Matrix (gateway/platforms/matrix.py):

  • Detect M_UNKNOWN_TOKEN and M_FORBIDDEN in nio.SyncError responses
  • Detect string-based auth indicators in exception messages
  • Stop sync loop immediately on permanent auth failure

Transient errors (network timeouts, connection resets) still retry with backoff as before.

How to Test

python3 -m pytest tests/gateway/test_ws_auth_retry.py -v

6 tests covering: 401 stops Mattermost, 403 stops Mattermost, transient retries Mattermost, M_UNKNOWN_TOKEN stops Matrix, 401 stops Matrix, transient retries Matrix.

Checklist

  • Tests added (6 tests)
  • Full test suite run — no regressions
  • Tested on Linux (Ubuntu 22.04)

…ures

Mattermost's _ws_loop and Matrix's _sync_loop both catch all exceptions
with a broad except Exception and retry with backoff forever. When the
auth token is invalid or revoked (401/403, M_UNKNOWN_TOKEN), these loops
spin indefinitely instead of stopping — wasting resources and flooding
logs with retry warnings.

Detect permanent auth errors (HTTP 401/403, WSServerHandshakeError,
M_UNKNOWN_TOKEN, M_FORBIDDEN) and stop the reconnect/sync loop
immediately instead of retrying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant