feat(triggers): DLQ classification + UI Replay + instrumentation (PR-3) by mt-alarcon · Pull Request #80 · evolution-foundation/evo-nexus

mt-alarcon · 2026-05-13T20:53:51Z

Context

Third and final of a 3-PR series implementing the WhatsApp retry pattern. Builds on #78 (PR-1: idempotency_key + silent dedup) and #79 (PR-2: backoff + jitter on api_request) by adding:

DLQ (Dead Letter Queue) classification of failed executions
UI Replay button + endpoint for retryable failures
Instrumentation endpoint (/stats) for observability

Diff includes PR-1 + PR-2 + PR-3 because they stack. Reviewer can focus on the PR-3 specific changes (the 3 files in this PR's commit):

dashboard/backend/routes/triggers.py (+263)
dashboard/frontend/src/pages/Triggers.tsx (+173)
dashboard/tests/test_wpp_retry_pr3.py (+203)

Changes

Step 4 — `_classify_error` + `failed_retryable` classification

New _classify_error(err_msg, exc) returns one of:

transient — timeouts, connection errors, HTTP 5xx, retry markers in stdout
permanent — ValueError, KeyError, missing-key/auth/forbidden errors
unknown — everything else

_execute_trigger now records the classification in result_summary["error_category"] so the UI can filter retryable failures.

Step 5 — `POST /api/triggers/executions/<id>/replay`

Endpoint replays a failed execution:

Returns 404 if execution not found
Returns 400 with not_replayable if the execution succeeded or was already replayed
Rate-limited to one replay per id per 60s (in-memory cache; multi-instance would need Redis)
On success: kicks off a new background execution with the same trigger config
Frontend Triggers.tsx gets a "Replay" button on each failed-retryable row in the executions table

Step 6 — `GET /api/triggers/stats`

Returns a JSON snapshot for observability:

24h success rate, failure rate, throughput
DLQ size (failed_retryable in last 7d, capped at 1000)
WhatsApp command count (last 24h)
Distinct users active (last 7d) — currently static 1 for single-user workspaces; revisit when multi-user lands
Retries observed (executions where backoff logs fired, last 24h)
Watermark flag: dlq_size > 10 → "dlq_growing" warning

UI shows a small stats banner above the executions table when watermark trips.

Tests

dashboard/tests/test_wpp_retry_pr3.py (203 lines):

TestClassifyError (8 tests) — verifies _classify_error for timeouts, ValueError, KeyError, HTTP 5xx markers, etc.
TestReplayEndpoint (4 tests) — 404 on missing, 400 on success/already-replayed, 429 on rate-limit, 200 + new execution kicked off on happy path
TestStatsEndpoint (3 tests) — JSON shape contract, watermark off below threshold, watermark on above threshold

Tests use a fresh dashboard.db per test and run end-to-end against the Flask app (no mocking of the DB layer).

Verification

All Python files pass ast.parse
tsc (existing project config) accepts Triggers.tsx changes
Manual smoke test against a real `dashboard.db` with seeded executions confirms stats shape

Series

✅ PR-1 #78: idempotency_key migration + silent dedup
✅ PR-2 #79: exponential backoff + jitter
PR-3 (this): DLQ classification + UI Replay + instrumentation

Summary by Sourcery

Introduce DLQ-style classification for trigger execution failures, add UI-driven replay of retryable executions, and expose operational stats for WhatsApp-trigger workloads.

New Features:

Classify trigger execution errors as transient or permanent and store the category to drive DLQ-style handling.
Expose a POST replay endpoint and UI flow to re-run failed retryable trigger executions with rate limiting and idempotent safeguards.
Add a stats endpoint and dashboard badge summarizing trigger execution throughput, DLQ size, WhatsApp volume, retries, and a circuit breaker watermark.

Enhancements:

Extend trigger execution schema and auto-migrations with idempotency keys, error categories, and replay metadata to support the retry pattern.
Instrument trigger execution and webhook flows with structured logging for executions, dedup events, and replays.
Apply exponential backoff with jitter and improved error handling to Evolution Go API requests, with dedicated tests verifying retry behavior.

Tests:

Add end-to-end and unit tests covering error classification, replay rate limiting, stats shape and watermark logic, and HTTP retry behavior for the Evolution Go client.

…ps 1+2) Step 1 — Migration (models.py + app.py): - TriggerExecution ganha 3 colunas nullable: idempotency_key, error_category, last_replay_at - to_dict() exposto com os 3 campos novos - Auto-migrate idempotente no startup: ALTER TABLE + IF NOT EXISTS em cada bloco - Partial unique index uq_trigger_idem (trigger_id, idempotency_key) WHERE NOT NULL - Basic index ix_trigger_executions_idem_key para lookups por key - SQLite 3.51 confirmado — partial index nativo; EXPLAIN QUERY PLAN confirma uso do índice Step 2 — Silent dedup (triggers.py): - webhook_receiver extrai idem_key de idempotency_key / messageId / data.messageId - Se key já existe: log idempotent_replay + 200 OK silencioso (pattern F6) - Race condition: IntegrityError no db.commit() → rollback + 200 OK silencioso - Legado (GitHub, Stripe, Linear): sem key → idem_key=None → fluxo normal inalterado - Limpeza: current_app movido para import no topo; imports inline removidos Testes passados: migration up/down idempotente, partial index unicidade, NULLs livres, extração de key (6 casos), race condition via IntegrityError, EXPLAIN QUERY PLAN. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…PR-2) Adds resilient retry logic to the Evolution Go API client. Transient HTTP errors (5xx, URLError, socket.timeout) now retry up to 3 times with exponential backoff + jitter; HTTP 4xx errors raise immediately (no retry, deterministic client errors). ## Changes **`.claude/skills/int-evolution-go/scripts/evolution_go_client.py`:** - New helper `_retry_http_call_client(do_call, max_attempts=3, base_delay=2.0, max_delay=8.0)` — generic retry wrapper with exponential backoff + jitter. Logs structured JSON events (`api_request_retry` / `api_request_failed`). - Refactor `api_request` to wrap the actual HTTP call in `_do_call` and delegate to the retry helper. Removes inline try/except/sys.exit so library callers can handle errors; the CLI `main()` catches and exits as before. - Refactor `main()` to wrap handler invocation in try/except for `urllib.error.HTTPError` / `URLError` / `socket.timeout` — preserves the exact CLI exit-1-on-failure behavior while letting library users propagate. **`tests/whatsapp/test_retry_backoff.py` (new):** 4 synthetic tests covering: 1. HTTP 500 x3 → retries then raises (TestApiRequestRetry) 2. HTTP 400 → raises immediately, no retry 3. URLError x3 → retries then raises 4. Success on third attempt → returns result, call_count=3 All 4 tests pass against the modified `api_request`. ## Worst-case latency 3 attempts with all 5xx: sleep ≤ 2^0 + 0.5 + 2^1 + 0.5 = 5.5s total (capped at max_delay=8s per attempt). Acceptable for an API call retry. ## Series - PR-1 (evolution-foundation#78 — merged or pending): idempotency_key migration + silent dedup - **PR-2 (this):** exponential backoff + jitter - PR-3 (next): DLQ classification + UI Replay + instrumentation

…teps 4+5+6) Step 4: _classify_error helper em triggers.py; _execute_trigger popula error_category (transient | permanent) e usa status failed_retryable para timeouts e HTTP 5xx, status failed para erros permanentes. Step 5: POST /api/triggers/executions/<id>/replay com rate-limit 60s, modal de confirmação no frontend (preview: destinatário/comando/timestamp), botão Replay condicional a status=failed_retryable, status replayed na execution original após replay iniciado. Step 6: GET /api/triggers/stats retorna 8 métricas (total, by_status, dlq_size, idempotent_replays, wpp_command_count, retries_observed, circuit_breaker_watermark_hit). Badge no /triggers UI com alerta amarelo quando watermark_hit=true (>50 WPP/dia). Log WARNING ao virar True. Logs estruturados evt= em webhook_receiver, _execute_trigger, replay. Testes sintéticos: 24/24 passam (unit: _classify_error 13 cenários, watermark 5 cenários, rate-limit 4 cenários; integração: auth guard + stats shape contra dashboard.db real). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sourcery-ai · 2026-05-13T20:53:57Z

Reviewer's Guide

Implements DLQ-style error classification for trigger executions, adds a replay endpoint and UI for retryable failures, and exposes a stats endpoint plus UI badge for trigger observability, along with necessary schema/migration updates and retry-client tests.

Sequence diagram for trigger execution replay flow with DLQ classification

sequenceDiagram
    actor User
    participant Frontend as TriggersPage
    participant Backend as replay_execution
    participant DB as TriggerExecution_DB
    participant Worker as _execute_trigger

    User->>Frontend: Click Replay (failed_retryable row)
    Frontend->>Backend: POST /api/triggers/executions/exec_id/replay
    Backend->>DB: SELECT TriggerExecution BY id
    DB-->>Backend: execution (status=failed_retryable)
    Backend->>DB: UPDATE execution.status = replayed
    Backend->>DB: INSERT TriggerExecution (status=pending, idempotency_key)
    DB-->>Backend: new_execution_id
    Backend-->>Frontend: 200 { new_execution_id }
    Backend->>Worker: start thread _execute_trigger(trigger_id, new_execution_id, original_event_data)
    Worker->>DB: SELECT TriggerExecution BY new_execution_id
    Worker->>Worker: run subprocess / action
    Worker->>Worker: _classify_error(stderr, exc)
    alt success
        Worker->>DB: UPDATE status = completed, error_category = null
    else transient failure
        Worker->>DB: UPDATE status = failed_retryable, error_category = transient
    else permanent failure
        Worker->>DB: UPDATE status = failed, error_category = permanent
    end

Sequence diagram for triggers stats badge loading

sequenceDiagram
    actor User
    participant Frontend as TriggersPage
    participant Backend as trigger_stats
    participant DB as TriggerExecution_DB

    User->>Frontend: Open /triggers
    Frontend->>Backend: GET /api/triggers/stats?days=1
    Backend->>DB: aggregate SELECTs on trigger_executions
    DB-->>Backend: by_status, dlq_size, counts
    Backend-->>Frontend: 200 { dlq_size, idempotent_replays, wpp_command_count, circuit_breaker_watermark_hit, ... }
    Frontend->>Frontend: update stats badge and watermark banner

File-Level Changes

Change	Details	Files
Classify trigger execution errors into transient vs permanent, store classification, and surface it in execution records for DLQ-style handling.	Introduce _classify_error helper and transient marker list to distinguish retryable vs permanent failures based on exception types and stderr markers. Update _execute_trigger to set status completed, failed_retryable, or failed, populate error_category and error fields, and add structured logging for classification outcomes. Handle subprocess timeouts and generic exceptions as transient or permanent via _classify_error, logging appropriately.	`dashboard/backend/routes/triggers.py` `dashboard/backend/models.py` `dashboard/backend/app.py` `dashboard/tests/test_wpp_retry_pr3.py`
Add idempotent webhook handling and replay support for trigger executions, including DB schema and race-safe dedup.	Extract idempotency_key (e.g., WhatsApp messageId) from webhook payloads, silently dedup duplicate events via query and via a partial unique index, and log idempotent replays. Extend TriggerExecution schema with idempotency_key, error_category, and last_replay_at plus indices and a partial unique constraint via startup migration. Implement POST /api/triggers/executions//replay to validate replayability, enforce a 60s per-execution rate limit, create a new pending execution preserving payload and idempotency_key, mark the original as replayed, and start a background execution. Log replay operations and handle IntegrityError during replay commit by treating existing successful executions as idempotent skips.	`dashboard/backend/routes/triggers.py` `dashboard/backend/app.py` `dashboard/backend/models.py` `dashboard/tests/test_wpp_retry_pr3.py`
Expose trigger pipeline stats via a new backend endpoint and display them in the Triggers UI as an operational badge with a watermark indicator.	Add GET /api/triggers/stats to compute execution counts by status, DLQ size, idempotent_replays, WhatsApp-related command counts, retries_observed, and a watermark flag based on volume/user thresholds using raw SQL aggregates. Log a warning when the watermark is hit and return a structured JSON payload consumed by the frontend. Fetch stats from the Triggers page on load and when opening executions, maintain them in React state, and render a compact stats badge showing DLQ size, replay counts, WPP volume, and a highlighted watermark warning banner. Add unit/integration tests to validate stats JSON shape and watermark logic independently of the DB where possible.	`dashboard/backend/routes/triggers.py` `dashboard/frontend/src/pages/Triggers.tsx` `dashboard/tests/test_wpp_retry_pr3.py`
Add UI support for replaying retryable failures from the executions table, including a confirmation modal driven by WhatsApp-specific payload introspection.	Extend Execution typing with error_category, idempotency_key, and last_replay_at, and render error_category alongside status with new color mappings for failed_retryable and replayed. Introduce a Replay button for failed_retryable executions that opens a preview modal showing inferred recipient, command, and timestamp derived from nested WhatsApp-like event_data. Implement replay modal state management, confirmation handler that calls the replay endpoint, shows toast feedback (including rate-limit messages), and refreshes the executions list. Ensure stats are refreshed when executions are viewed or replayed, keeping the DLQ/replay badge in sync with backend metrics.	`dashboard/frontend/src/pages/Triggers.tsx`
Refactor Evolution Go client API calls to add exponential backoff with jitter and make retry behavior testable, plus add dedicated tests.	Introduce _retry_http_call_client to wrap HTTP calls with up-to-3-attempt exponential backoff and jitter for HTTP 5xx and network errors, while treating HTTP 4xx as immediate permanent failures. Refactor api_request to delegate to the retry helper and raise exceptions instead of exiting, so callers can handle failures; update the CLI main entrypoint to catch these and exit with appropriate messages. Add unittest-based tests for api_request that simulate HTTP 5xx, HTTP 4xx, and URLError cases, asserting retry counts, no-retry behavior, and success on later attempts, using patched urlopen and sleep. Document behavior and constraints in test docstrings and comments, including sleep budget expectations for worst-case retries.	`.claude/skills/int-evolution-go/scripts/evolution_go_client.py` `tests/whatsapp/test_retry_backoff.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In _classify_error and the TriggerExecution.error_category comments you mention/expect categories like unknown (and validation), but the implementation only ever returns transient or permanent; consider either adding the missing categories in code paths or tightening the comments/typing so consumers don’t rely on categories that can’t occur.
In trigger_stats, the WPP IN clause is built via string interpolation (id_list) and embeds the since_clause directly into the SQL string; even though these values are currently derived from integers and a constrained days param, it would be more robust and maintainable to switch to parameterized queries or SQLAlchemy in_ expressions to avoid subtle SQL injection or quoting bugs if this logic evolves.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_classify_error` and the `TriggerExecution.error_category` comments you mention/expect categories like `unknown` (and `validation`), but the implementation only ever returns `transient` or `permanent`; consider either adding the missing categories in code paths or tightening the comments/typing so consumers don’t rely on categories that can’t occur.
- In `trigger_stats`, the WPP `IN` clause is built via string interpolation (`id_list`) and embeds the `since_clause` directly into the SQL string; even though these values are currently derived from integers and a constrained `days` param, it would be more robust and maintainable to switch to parameterized queries or SQLAlchemy `in_` expressions to avoid subtle SQL injection or quoting bugs if this logic evolves.

## Individual Comments

### Comment 1
<location path="dashboard/backend/routes/triggers.py" line_range="387-353" />
<code_context>
+    """
+    last_exc = None
+    for attempt in range(max_attempts):
+        try:
+            return do_call()
+        except urllib.error.HTTPError as e:
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching all IntegrityError on commit can hide non-idempotency-related DB issues

The `except IntegrityError:` in `webhook_receiver` treats all integrity violations as the `(trigger_id, idempotency_key)` uniqueness race and responds 200. If any other constraint (e.g., NOT NULL, FK, new unique index) fails, it would be misclassified as an idempotent replay and silently masked. Please narrow the handler to only the idempotency constraint (e.g., by checking constraint name/message or isolating that insert in its own transaction) and let other `IntegrityError`s propagate.
</issue_to_address>

### Comment 2
<location path="dashboard/frontend/src/pages/Triggers.tsx" line_range="262-271" />
<code_context>
+    })
+  }
+
+  const confirmReplay = async () => {
+    if (!replayPreview) return
+    setReplaying(true)
+    try {
+      const result = await api.post(`/triggers/executions/${replayPreview.execId}/replay`)
+      toast.success(`Replay iniciado — nova execução #${result.new_execution_id}`)
+      setReplayPreview(null)
+      // Refresh executions list
+      if (execModal) {
+        const data = await api.get(`/triggers/${execModal.triggerId}/executions`)
+        setExecutions(data.executions || [])
+      }
+    } catch (e: unknown) {
+      const msg = e instanceof Error ? e.message : String(e)
+      if (msg.includes('rate_limited') || msg.includes('429')) {
+        toast.error('Rate limit: aguarde 60s antes de fazer replay novamente')
+      } else {
</code_context>
<issue_to_address>
**issue (bug_risk):** Replay error handling relies on substring matches in Error.message, which may be brittle

In `confirmReplay`, rate limiting is inferred by checking `Error.message` for `rate_limited` or `429`. Depending on how `api` throws (axios, fetch, custom wrapper), the HTTP status or backend `error` may not appear in `.message`, so rate-limited responses could fall through to the generic error toast. Prefer checking structured fields from the client (e.g. `e.response.status` or `e.response.data.error`) instead of parsing the message string.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…iggers Endereça review do Sourcery em PR evolution-foundation#80 (Comment 1) e estende a mesma proteção para o endpoint de replay (não citado pelo Sourcery, mas contém o mesmo bug latente — `except IntegrityError` genérico). Dois pontos corrigidos em dashboard/backend/routes/triggers.py: 1. `webhook_receiver` (linha ~389): mesmo bug do PR evolution-foundation#78. Capturava qualquer IntegrityError e respondia 200 OK silencioso. Agora só absorve se a mensagem do driver menciona `uq_trigger_idem` ou `idempotency_key` e `idem_key` está definido. Caso contrário, loga erro completo e re-raise. 2. Endpoint `/triggers/executions/<id>/replay` (linha ~476): cria nova execution reusando `ex.idempotency_key`. Tinha o mesmo catch genérico, mascarando outros IntegrityError como "idempotent_skip". Aplicada a mesma proteção, usando `ex.idempotency_key` no check. Sem teste dedicado pra esse path nessa branch — sintaxe validada com py_compile.

…urado Polish do review do Sourcery em PR evolution-foundation#80 (sugestões code-quality, 3 frentes): 1. `_classify_error` — tighten typing - Define alias `ErrorCategory = Literal["transient", "permanent"]` - Atualiza return type pra refletir o que a função realmente retorna - Atualiza comentário em `TriggerExecution.error_category` (models.py) que listava 4 categorias (validation/unknown) que nunca eram emitidas pelo classificador — consumidores não vão depender de valores que não existem no código. 2. `trigger_stats` — parametriza SQL - `datetime('now', '-{days} days')` interpolado vira `:cutoff` bindado (cutoff calculado em Python com timedelta). - `WHERE trigger_id IN ({id_list})` vira `IN :wpp_ids` com `bindparam("wpp_ids", expanding=True)` — SQLAlchemy expande pra `(:id_1, :id_2, ...)` e binda cada valor. - Inputs hoje são "seguros" (ints + days constrained), mas parametrizar é mais robusto contra regressões e mais legível. 3. `confirmReplay` (Triggers.tsx) — error handling estruturado - Cria `ApiError` em `lib/api.ts` carregando `status` (número HTTP) e `code` (string do JSON do backend, ex.: `rate_limited`). - `buildError` agora retorna `ApiError` em vez de `Error` simples. - `Triggers.tsx::confirmReplay` checa `e instanceof ApiError && (e.status === 429 || e.code === 'rate_limited')` em vez de `msg.includes('429')`. Mais resiliente a mudanças no formato da mensagem (ex.: tradução, prefixo). - `.message` preservado no mesmo formato (`"<status> <text>: <detail>"`) pra não quebrar os 3 outros call sites que usam `msg.includes(...)` (Backups.tsx, StepBrainConnect.tsx, RestoreSelectRepo.tsx) — migração desses fica pra um PR dedicado. Sem mudança funcional em runtime saudável; só estrutura/robustez.

… índice + logs Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#78: 1. `except IntegrityError` restrito à violação de idempotência (dashboard/backend/routes/triggers.py) Antes, qualquer IntegrityError no `webhook_receiver` virava 200 OK silencioso — tratado como "replay idempotente". Outros problemas (NOT NULL, FK quebrada, novo unique constraint adicionado depois) ficariam mascarados como sucesso e o cliente nunca saberia que o evento não foi processado. Agora o catch só absorve o erro se: a) `idem_key` está definido (sem chave de idempotência não há como ser violação de uq_trigger_idem) b) A mensagem do erro do driver menciona `uq_trigger_idem` ou `idempotency_key` (constraint específica) Para os demais IntegrityError, loga com contexto completo (trigger_id, key, mensagem original) e re-raise — webhook retorna 500 e o erro fica visível em logs em vez de virar 200 silencioso. 2. Remove índice redundante (models.py) `TriggerExecution.idempotency_key` estava criando 3 índices pra mesma coluna: - `ix_trigger_executions_idempotency_key` (auto-gerado via index=True no model) - `ix_trigger_executions_idem_key` (raw SQL no startup) - `uq_trigger_idem` (partial unique, raw SQL) Mantemos só os dois últimos — explicitamente criados pela migration de startup, com nomes versionados no rollback plan. `index=True` removido da definição do model. 3. Loga falha de CREATE INDEX em vez de silenciar (app.py) Substitui `except Exception: pass` por log estruturado nos dois `CREATE INDEX` (basic e partial unique). Antes, qualquer falha (ex.: versão de SQLite sem suporte a partial index) era engolida silenciosamente — operador só descobriria no primeiro race. Agora loga `sqlite_lib`, `sqlite_runtime` e a exceção original. Sintaxe validada com `python3 -m py_compile`. Sem teste dedicado pro path nessa branch (testes WPP só entram nos PRs evolution-foundation#79/evolution-foundation#80).

… SQL, ApiError Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#80: 1. `except IntegrityError` restrito à violação de idempotência (dashboard/backend/routes/triggers.py — 2 pontos) Mesmo bug do PR evolution-foundation#78 estendido a este PR. Dois pontos no triggers.py: a) `webhook_receiver` (~linha 389): mesma correção do PR evolution-foundation#78 — só absorve o erro se a mensagem do driver menciona `uq_trigger_idem` ou `idempotency_key` E `idem_key` está set; caso contrário loga com contexto completo e re-raise. b) Endpoint `/triggers/executions/<id>/replay` (~linha 476): cria nova execution reusando `ex.idempotency_key`. Tinha o mesmo catch genérico, mascarando outros IntegrityError como "idempotent_skip". Aplicada a mesma proteção usando `ex.idempotency_key` no check. (Sourcery não citou este segundo ponto, mas é o mesmo bug — corrigido por simetria.) 2. `_classify_error` — tighten typing - Define alias `ErrorCategory = Literal["transient", "permanent"]`. - Atualiza return type pra refletir o que a função realmente retorna. - Atualiza comentário em `TriggerExecution.error_category` (models.py) que listava 4 categorias (validation/unknown) que nunca eram emitidas pelo classificador — consumidores não vão mais depender de valores inexistentes. 3. `trigger_stats` — parametriza SQL - `datetime('now', '-{days} days')` interpolado vira `:cutoff` bindado (cutoff calculado em Python com `timedelta`). - `WHERE trigger_id IN ({id_list})` vira `IN :wpp_ids` com `bindparam("wpp_ids", expanding=True)` — SQLAlchemy expande pra `(:id_1, :id_2, ...)` e binda cada valor. - Inputs hoje são "seguros" (ints + days constrained), mas parametrizar é mais robusto contra regressões e mais legível. 4. `confirmReplay` (Triggers.tsx) — error handling estruturado - Cria `ApiError` em `lib/api.ts` carregando `status` (número HTTP) e `code` (string do JSON do backend, ex.: `rate_limited`). - `buildError` agora retorna `ApiError` em vez de `Error` simples. - `Triggers.tsx::confirmReplay` checa `e instanceof ApiError && (e.status === 429 || e.code === 'rate_limited')` em vez de `msg.includes('429')`. Mais resiliente a mudanças no formato da mensagem (ex.: tradução, prefixo). - `.message` preservado no mesmo formato (`"<status> <text>: <detail>"`) pra não quebrar os 3 outros call sites que usam `msg.includes(...)` (Backups.tsx, StepBrainConnect.tsx, RestoreSelectRepo.tsx) — migração desses fica pra um PR dedicado. Sintaxe Python validada com `python3 -m py_compile`. Typecheck do frontend deixado pro CI do PR (deps não instaladas localmente).

Marcello Alarcon and others added 3 commits May 13, 2026 17:22

sourcery-ai Bot reviewed May 13, 2026

View reviewed changes

Comment thread dashboard/backend/routes/triggers.py

Comment thread dashboard/frontend/src/pages/Triggers.tsx

mt-alarcon force-pushed the feat/wpp-retry-pr3-clean branch from da958c0 to 935d4f9 Compare May 13, 2026 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(triggers): DLQ classification + UI Replay + instrumentation (PR-3)#80

feat(triggers): DLQ classification + UI Replay + instrumentation (PR-3)#80
mt-alarcon wants to merge 4 commits into
evolution-foundation:mainfrom
mt-alarcon:feat/wpp-retry-pr3-clean

mt-alarcon commented May 13, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 13, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mt-alarcon commented May 13, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Step 4 — _classify_error + failed_retryable classification

Step 5 — POST /api/triggers/executions/<id>/replay

Step 6 — GET /api/triggers/stats

Tests

Verification

Series

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for trigger execution replay flow with DLQ classification

Sequence diagram for triggers stats badge loading

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mt-alarcon commented May 13, 2026 •

edited by sourcery-ai Bot

Loading

Step 4 — `_classify_error` + `failed_retryable` classification

Step 5 — `POST /api/triggers/executions/<id>/replay`

Step 6 — `GET /api/triggers/stats`

sourcery-ai Bot commented May 13, 2026 •

edited

Loading