Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions commands/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@

- **Pydantic I/O**: All commands use `CommandInput`/`CommandOutput` subclasses for type safety and serialization.
- **Error handling**: Permanent errors return failure output; `RuntimeError` exceptions auto-retry via surreal-commands.
- **Retry configuration**: Aggressive retry settings (15 attempts, 1-120s backoff, DEBUG log level) are a temporary workaround for SurrealDB v2.x transaction conflicts with SEARCH indexes. These can be reduced after migrating to SurrealDB v3.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Documentation inconsistency: This line documents '15 attempts' but the Key Components section (line 7) still says 'max 5×' for process_source_command. Consider updating line 7 to reflect the new retry count for consistency.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/CLAUDE.md, line 18:

<comment>Documentation inconsistency: This line documents &#39;15 attempts&#39; but the Key Components section (line 7) still says &#39;max 5×&#39; for `process_source_command`. Consider updating line 7 to reflect the new retry count for consistency.</comment>

<file context>
@@ -15,8 +15,9 @@
 
 - **Pydantic I/O**: All commands use `CommandInput`/`CommandOutput` subclasses for type safety and serialization.
 - **Error handling**: Permanent errors return failure output; `RuntimeError` exceptions auto-retry via surreal-commands.
+- **Retry configuration**: Aggressive retry settings (15 attempts, 1-120s backoff, DEBUG log level) are a temporary workaround for SurrealDB v2.x transaction conflicts with SEARCH indexes. These can be reduced after migrating to SurrealDB v3.
 - **Model dumping**: Recursive `full_model_dump()` utility converts Pydantic models → dicts for DB/API responses.
-- **Logging**: Uses `loguru.logger` throughout; logs execution start/end and key metrics (processing time, counts).
</file context>
Fix with Cubic

- **Model dumping**: Recursive `full_model_dump()` utility converts Pydantic models → dicts for DB/API responses.
- **Logging**: Uses `loguru.logger` throughout; logs execution start/end and key metrics (processing time, counts).
- **Logging**: Uses `loguru.logger` throughout; logs execution start/end and key metrics (processing time, counts). Retry attempts use `retry_log_level: "debug"` to prevent log noise during concurrent chunk processing.
- **Time tracking**: All commands measure `start_time``processing_time` for monitoring.

## Dependencies
Expand All @@ -26,10 +27,11 @@

## Quirks & Edge Cases

- **source_commands**: `ensure_record_id()` wraps command IDs for DB storage; transaction conflicts trigger exponential backoff retry (1-30s). Non-`RuntimeError` exceptions are permanent.
- **embedding_commands**: Queries DB directly for item state; chunk index must match source's chunk list. Model availability checked at command start.
- **source_commands**: `ensure_record_id()` wraps command IDs for DB storage; transaction conflicts trigger exponential backoff retry (1-120s, up to 15 attempts). Non-`RuntimeError` exceptions are permanent. Retry logs at DEBUG level via `retry_log_level` config.
- **embedding_commands**: Queries DB directly for item state; chunk index must match source's chunk list. Model availability checked at command start. Aggressive retry settings (15 attempts, 120s max wait, DEBUG logging) handle deep queues from large documents without log spam.
- **podcast_commands**: Profiles loaded from SurrealDB by name (must exist); briefing can be extended with suffix. Episode records created mid-execution.
- **Example commands**: Accept optional `delay_seconds` for testing async behavior; not for production.
- **Retry logging**: Uses `retry_log_level: "debug"` in decorator config + manual `logger.debug()` in exception handlers for double protection against retry log noise.

## Code Example

Expand Down
19 changes: 12 additions & 7 deletions commands/embedding_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,11 +190,12 @@ async def embed_single_item_command(
"embed_chunk",
app="open_notebook",
retry={
"max_attempts": 5,
"max_attempts": 15, # Increased from 5 to handle deep queues (workaround for SurrealDB v2 transaction conflicts)
"wait_strategy": "exponential_jitter",
"wait_min": 1,
"wait_max": 30,
"wait_max": 120, # Increased from 30s to 120s to allow queue to drain
"retry_on": [RuntimeError, ConnectionError, TimeoutError],
"retry_log_level": "debug", # Use debug level to avoid log noise with hundreds of chunks
},
)
async def embed_chunk_command(
Expand All @@ -206,14 +207,18 @@ async def embed_chunk_command(
This command is designed to be submitted as a background job for each chunk
of a source document, allowing natural concurrency control through the worker pool.

Retry Strategy:
- Retries up to 5 times for transient failures:
Retry Strategy (SurrealDB v2 workaround):
- Retries up to 15 times for transient failures (increased from 5):
* RuntimeError: SurrealDB transaction conflicts ("read or write conflict")
* ConnectionError: Network failures when calling embedding provider
* TimeoutError: Request timeouts to embedding provider
- Uses exponential-jitter backoff (1-30s) to prevent thundering herd during concurrent operations
- Uses exponential-jitter backoff (1-120s, increased from 30s max)
- Higher retry limits allow deep queues (200+ chunks) to drain during concurrent processing
- Does NOT retry permanent failures (ValueError, authentication errors, invalid input)

Note: These aggressive retry settings are a temporary workaround for SurrealDB v2.x
transaction conflict issues. Can be reduced once migrated to SurrealDB v3.

Exception Handling:
- RuntimeError, ConnectionError, TimeoutError: Re-raised to trigger retry mechanism
- ValueError and other exceptions: Caught and returned as permanent failures (no retry)
Expand Down Expand Up @@ -263,13 +268,13 @@ async def embed_chunk_command(

except RuntimeError:
# Re-raise RuntimeError to allow retry mechanism to handle DB transaction conflicts
logger.warning(
logger.debug(
f"Transaction conflict for chunk {input_data.chunk_index} - will be retried by retry mechanism"
)
raise
except (ConnectionError, TimeoutError) as e:
# Re-raise network/timeout errors to allow retry mechanism to handle transient provider failures
logger.warning(
logger.debug(
f"Network/timeout error for chunk {input_data.chunk_index} ({type(e).__name__}: {e}) - will be retried by retry mechanism"
)
raise
Expand Down
7 changes: 4 additions & 3 deletions commands/source_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,12 @@ class SourceProcessingOutput(CommandOutput):
"process_source",
app="open_notebook",
retry={
"max_attempts": 5,
"max_attempts": 15, # Increased from 5 to handle deep queues (workaround for SurrealDB v2 transaction conflicts)
"wait_strategy": "exponential_jitter",
"wait_min": 1,
"wait_max": 30,
"wait_max": 120, # Increased from 30s to 120s to allow queue to drain
"retry_on": [RuntimeError],
"retry_log_level": "debug", # Use debug level to avoid log noise during transaction conflicts
},
)
async def process_source_command(
Expand Down Expand Up @@ -136,7 +137,7 @@ async def process_source_command(

except RuntimeError as e:
# Transaction conflicts should be retried by surreal-commands
logger.warning(f"Transaction conflict, will retry: {e}")
logger.debug(f"Transaction conflict, will retry: {e}")
raise

except Exception as e:
Expand Down
4 changes: 2 additions & 2 deletions open_notebook/database/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Both leverage connection context manager for lifecycle management and automatic
- **Async-first design**: All operations async via AsyncSurreal; sync wrapper provided for legacy code
- **Connection per operation**: Each repo_* function opens/closes connection (no pooling); designed for serverless/stateless API
- **Auto-timestamping**: repo_create() and repo_update() auto-set `created`/`updated` fields
- **Error resilience**: RuntimeError for transaction conflicts (retriable); catches and re-raises other exceptions
- **Error resilience**: RuntimeError for transaction conflicts (retriable, logged at DEBUG level); catches and re-raises other exceptions
- **RecordID polymorphism**: Functions accept string or RecordID; coerced to consistent type
- **Graceful degradation**: Migration queries catch exceptions and treat table-not-found as version 0

Expand All @@ -91,7 +91,7 @@ Both leverage connection context manager for lifecycle management and automatic
- **Record ID format inconsistency**: repo_update() accepts both `table:id` format and full RecordID; path handling can be subtle
- **ISO date parsing**: repo_update() parses `created` field from string to datetime if present; assumes ISO format
- **Timestamp overwrite risk**: repo_create() always sets new timestamps; can't preserve original created time on reimport
- **Transaction conflict handling**: RuntimeError from transaction conflicts logged without stack trace (prevents log spam)
- **Transaction conflict handling**: RuntimeError from transaction conflicts logged at DEBUG level without stack trace (prevents log spam during concurrent operations)
- **Graceful null returns**: get_all_versions() returns [] on table missing; allows migration system to bootstrap cleanly

## How to Extend
Expand Down
4 changes: 2 additions & 2 deletions open_notebook/database/repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ async def repo_query(
raise RuntimeError(result)
return result
except RuntimeError as e:
# RuntimeError is raised for retriable transaction conflicts - log without stack trace
logger.error(str(e))
# RuntimeError is raised for retriable transaction conflicts - log at debug to avoid noise
logger.debug(str(e))
raise
except Exception as e:
logger.exception(e)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ dependencies = [
"esperanto>=2.13",
"surrealdb>=1.0.4",
"podcast-creator>=0.7.0",
"surreal-commands>=1.2.0",
"surreal-commands>=1.3.0",
]

[tool.setuptools]
Expand Down
Loading