Skip to content

fix: harden error recovery in validators, chatbot retry, and atomic_translate#118

Merged
MaleicAcid merged 1 commit into
zh-plus:masterfrom
iautolab:fix/error-recovery-hardening
Apr 28, 2026
Merged

fix: harden error recovery in validators, chatbot retry, and atomic_translate#118
MaleicAcid merged 1 commit into
zh-plus:masterfrom
iautolab:fix/error-recovery-hardening

Conversation

@MaleicAcid
Copy link
Copy Markdown
Collaborator

Fix

This PR fixes four error recovery bugs and increases the tokenizer safety margin.

1. Validators crash on None response content

When the LLM API returns an empty response, get_content() returns None. All four validators (ChunkedTranslateValidator, AtomicTranslateValidator, ProofreaderValidator, ContextReviewerValidateValidator) pass this directly to re.search() or detect_language_of(), which raises TypeError: expected string or bytes-like object. This was observed on CI.

Added if not generated_content: return False guard to all four validators.

2. ClaudeBot mutates caller's message list

ClaudeBot._create_chat() calls messages.pop(0) to extract the system message, which modifies the caller's list in place. If the retry loop re-enters or if the same message list is reused, the system message is already gone. GeminiBot already avoids this with deepcopy(messages).

Added messages = list(messages) shallow copy before the pop.

3. _create_chat silently returns unvalidated response

When output_checker fails on every retry attempt, the retry loop exits normally. Since the API call itself succeeded, response is not None, so the if not response guard does not trigger. The method returns a response that never passed format validation, and the caller has no way to know.

Added a validated flag that is set to True only when output_checker passes. After the loop, if not validated, a warning is logged. This makes the failure visible in logs while preserving the existing best-effort behavior (returning the last response rather than raising).

4. atomic_translate uses assert for error handling

assert is stripped in python -O mode. As the last fallback in the translation pipeline, atomic_translate should always report failures reliably.

Replaced assert len(translated) == len(texts) with if ... raise ChatBotException(...).

5. Tokenizer safety margin increased from 5% to 10%

_compute_max_tokens() and the three token budget calculations in ContextReviewerAgent use a safety margin to account for the difference between tiktoken's token estimates and the actual model's tokenizer. CI logs showed divergences of 3-3.5% on non-GPT models (Gemini via OpenRouter), which is close enough to 5% to cause LengthExceedException on large chunks. Increased to 10% to provide sufficient headroom.

@MaleicAcid MaleicAcid merged commit 281bdca into zh-plus:master Apr 28, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant