Add standalone MCP docs server mounting#2565
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces extensive documentation and implementation plans across several modules, including the new Audio Studio workspace, a comprehensive Data Flow Atlas, the Calendar module with CalDAV integration, and the Research Workspace migration protocol. It also updates the Chatbook API documentation to support the v1.1 format specification and adds setup guides for OmniVoice TTS. The review feedback correctly identifies a syntax error in the CalDAV smoke test documentation where JSON placeholders are unquoted, which would cause parsing errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
95a632d to
4ed8b3c
Compare
PR Summary by QodoAdd standalone MCP docs mount and tldw_server host adapter wiring
AI Description
Diagram
High-Level Assessment
Files changed (62)
|
Code Review by Qodo
Context used✅ Compliance rules (platform):
74 rules 1.
|
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/mcp-unified/src/mcp_unified/docs/acquisition/service.py (1)
1-141: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNo logging around fetch/denial outcomes.
ingest_urlhas several security-relevant failure/denial paths (policy denial, robots gate, fetch failure) but nothing is logged via Loguru anywhere in this module. Alogger.info/logger.warningon non-"fetched"statuses would aid auditing of blocked/failed acquisition attempts without needing to inspect return values downstream.As per coding guidelines, "Use Loguru for logging throughout the codebase (
from loguru import logger)".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/service.py` around lines 1 - 141, `DocsAcquisitionService.ingest_url` returns several non-"fetched" outcomes but does not log any of them, which makes blocked or failed acquisitions hard to audit. Add Loguru logging in this module by importing `logger`, and emit a `logger.info` or `logger.warning` in the `ingest_url` early-return paths for capability disabled and any `fetched.status != "fetched"` cases, including the URL, status/reason, and final URL/redirect context; keep the normal success path unchanged.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/extract.py`:
- Around line 23-73: The rich extraction path in extract_fetched_document is
running for non-HTML payloads, which can mislabel decoded binary or unexpected
content as HTML. Update extract_fetched_document so _extract_with_trafilatura
and _extract_with_beautifulsoup are only attempted when media_type is in
_HTML_CONTENT_TYPES, and for anything else return the plain text fallback with
document_type="text" and extraction_method="text". This should also eliminate
the unreachable static_html or _parse_static_html(...) fallback at the end by
making the HTML-only branch in extract_fetched_document and _parse_static_html
the only path that returns HTML documents.
In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/fetcher.py`:
- Around line 327-384: The response reader currently treats all bodies as raw
bytes, so `_read_response` and `_read_body_chunks` need to detect
`Transfer-Encoding: chunked` and dechunk before returning content. Update the
body-reading path to branch on the parsed headers from `_parse_headers`, keeping
normal `Content-Length`/unframed handling intact while decoding chunked framing
safely with `max_body_bytes` enforcement. Use the existing `_read_response` and
`_read_body_chunks` flow as the integration point, and add a regression test in
`test_docs_acquisition_fetcher.py` that covers a `transfer-encoding: chunked`
response and verifies the extracted body is dechunked correctly.
In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/resolver.py`:
- Around line 15-28: StdlibResolver.resolve currently creates ResolvedAddress
instances without setting is_private, so the field always stays at its default.
Update StdlibResolver.resolve to determine privacy for each resolved IP and pass
that value into ResolvedAddress when appending results, using the existing
resolve flow and the ResolvedAddress field names to keep future consumers from
getting false negatives. Keep the deduping and family filtering logic unchanged.
In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/service.py`:
- Around line 32-50: `DocsAcquisitionService.ingest_url()` is calling the
blocking `self.fetcher.fetch(url)` directly on the async tool path from
`DocsModule.execute_tool()`, which can stall the event loop. Refactor the
ingestion flow so the fetch work is offloaded to a thread/executor or converted
to an async fetch API, and update the `ingest_url` call site so URL ingestion no
longer performs synchronous DNS/socket I/O inline. Use the existing `ingest_url`
and `fetcher.fetch` symbols to locate the blocking section.
- Around line 32-96: Preserve existing keywords and collection memberships
during URL re-ingest in ingest_url. The current call to store.upsert_document
treats empty keywords/collection_names as replacements, so omitted values clear
existing associations. Change ingest_url and the upsert path to distinguish “not
provided” from “clear” by using sentinel defaults like None or by fetching the
current document state before updating. Update DocsCatalogStore/get_document or
add a document-level accessor if needed so upsert_document can merge existing
memberships instead of overwriting them.
In `@apps/mcp-unified/src/mcp_unified/docs/importers/base.py`:
- Around line 30-45: In chunks_from_text, add a guard for the overlap/max_chars
relationship so start always advances and the while loop cannot hang. Validate
the inputs at the top of the function (or adjust the step logic) so overlap is
strictly smaller than max_chars, and reference the chunks_from_text helper in
base.py when implementing the check.
In `@apps/mcp-unified/src/mcp_unified/docs/importers/local.py`:
- Around line 78-93: The directory import flow in _iter_import_files and
import_path is too eager and can fail the whole batch on the first unsupported
file while leaving earlier store.upsert_document writes committed. Update
LocalDocsImporter to filter candidates by SUPPORTED_SUFFIXES before parsing,
skip unsupported files (including hidden/binary/extension-less ones), and make
the batch import resilient so one bad file does not abort the entire directory
import. Keep the fix centered around _iter_import_files, _parse_file, and
import_path so the directory walk only yields supported documents and partial
writes are avoided or safely handled.
In `@apps/mcp-unified/src/mcp_unified/docs/importers/markdown.py`:
- Around line 8-29: The markdown importer in parse_markdown is treating lines
inside fenced code blocks as headings, which can add bogus ParsedSection entries
and override the document title. Update parse_markdown in the markdown importer
to track when it is inside a fenced block (for both ``` and ~~~) and skip
HEADING_RE matching while fenced, then resume heading detection only after the
fence closes. Keep the fix localized to parse_markdown and the section/title
handling logic so real headings still populate ParsedDocument correctly.
In `@apps/mcp-unified/src/mcp_unified/docs/retrieval/context.py`:
- Around line 13-70: The retrieval packer in build() is only sampling a fixed
oversample window, so it can stop short of max_chunks when early results are too
concentrated by document. Update the search windowing logic in context.py
(around build, SearchRequest, and the chunks/seen_documents loop) to expand or
retry with a larger limit until the max_documents diversity quota or max_chunks
budget is actually satisfied. Also fix omitted to reflect the true remaining
matches rather than just len(search["results"]) - len(chunks), ideally by having
Retrieval.search expose a total-hit count or equivalent metadata and using that
when assembling the return payload.
In `@apps/mcp-unified/src/mcp_unified/docs/settings.py`:
- Around line 43-54: The numeric coercers in _coerce_positive_int and
_coerce_positive_float currently let int(value) and float(value) raise generic
ValueError messages for invalid input, which drops the field context. Update
both helpers to catch conversion failures and re-raise ValueError including
field_name in the message, consistent with the other coercers in settings.py,
while keeping the existing positive/finite validation logic intact.
In `@apps/mcp-unified/src/mcp_unified/docs/standalone.py`:
- Around line 44-49: The default db_path in create_standalone_docs_mount is
CWD-relative, which makes standalone behavior depend on where the process is
launched. Update the default db_path value in the values dict to use a stable
absolute or package-relative location instead of "Databases/mcp_docs.db", and
keep the existing override behavior intact for callers that pass db_path
explicitly.
In `@apps/mcp-unified/src/mcp_unified/docs/store/schema.sql`:
- Around line 38-49: The explicit scope indexes for docs_collections,
docs_keywords, and docs_aliases duplicate the UNIQUE constraint’s implicit
auto-index on the same column set. In schema.sql, remove
docs_collections_scope_idx and the matching scope indexes for docs_keywords and
docs_aliases, leaving the UNIQUE(owner_scope, profile_scope, name) constraint to
provide the needed index behavior.
- Around line 97-117: The docs_chunks_fts virtual table is only kept in sync by
_replace_document_rows, so it will not automatically follow docs_documents
deletions through the ON DELETE CASCADE path. Update the schema around
docs_chunks_fts to add an explicit trigger-based cleanup note or trigger plan,
and reference DocsCatalogStore/_replace_document_rows so future delete_document
or direct delete paths also remove matching FTS rows.
In `@apps/mcp-unified/src/mcp_unified/docs/store/sqlite.py`:
- Around line 663-686: The helper methods `_schema_version`, `_fts_available`,
and `_count` are swallowing SQLite/lookup failures and returning fallback values
without any trace, so add Loguru logging before each fallback. Update these
methods in `sqlite.py` to catch the same exceptions, log the exception with
clear context about which query/table failed, then return `0` or `False` as
appropriate. Keep the existing public behavior of `status()`, but ensure the
underlying error is visible for debugging by using the shared logger
consistently.
- Around line 108-195: The upsert logic in upsert_document is vulnerable to a
race because it does a SELECT followed by INSERT, which can conflict under
concurrent writers on the unique owner_scope/profile_scope/canonical_uri key.
Replace that flow with an atomic UPSERT using docs_documents and its conflict
target so the row is inserted or updated in one statement and the existing id is
returned/stable for _replace_document_rows, _replace_document_keywords, and
_replace_collection_memberships. If the database/API requires it, use a retry or
equivalent returning-id pattern, but keep the operation atomic around the unique
key.
In `@tldw_Server_API/app/core/MCP_unified/modules/implementations/docs_module.py`:
- Around line 40-54: Add validation in
DocsMCPToolProvider.execute_tool/validate_tool_arguments for the remaining docs
tool names so missing required inputs are rejected with ValueError before
provider execution. Extend validate_tool_arguments to cover docs.resolve,
docs.list, docs.get, docs.collections.create/update/set_membership,
docs.keywords.apply, resolve-library-id, and get-library-docs by checking each
expected field used later by execute(). Keep the existing sanitization and
error-wrapping flow in execute_tool so all invalid/missing arguments surface
through the same ValueError path instead of uncaught KeyError.
In `@tldw_Server_API/tests/MCP_unified/docs/test_docs_module_shim.py`:
- Around line 106-113: The test in DocsModule is checking source text instead of
runtime behavior, so replace the inspect.getsource assertions with a
behavior-based test that exercises DocsModule.on_initialize and/or
DocsModule.execute_tool and verifies docs_settings_from_module_config and
docs_scope_from_context are actually called with the expected arguments using
monkeypatch/spies. Keep the focus on the host adapter boundary in DocsModule and
avoid asserting on literal strings or implementation details.
In `@tldw_Server_API/tests/MCP_unified/docs/test_docs_schema_store.py`:
- Around line 165-206: Add a concurrency regression test for
DocsCatalogStore.upsert_document that exercises the TOCTOU race in the same
scope by issuing two upserts with the same canonical_uri from separate threads
or connections against the same DocsCatalogStore instance. Reuse the existing
test setup in test_default_scope_upsert_replaces_document_without_duplicates,
but drive concurrent writes and assert the outcome is either a single surviving
document with the updated content or the current IntegrityError until the
sqlite.py path is fixed.
---
Outside diff comments:
In `@apps/mcp-unified/src/mcp_unified/docs/acquisition/service.py`:
- Around line 1-141: `DocsAcquisitionService.ingest_url` returns several
non-"fetched" outcomes but does not log any of them, which makes blocked or
failed acquisitions hard to audit. Add Loguru logging in this module by
importing `logger`, and emit a `logger.info` or `logger.warning` in the
`ingest_url` early-return paths for capability disabled and any `fetched.status
!= "fetched"` cases, including the URL, status/reason, and final URL/redirect
context; keep the normal success path unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f06e500-2ad7-4cc7-9bf9-30557a27e219
⛔ Files ignored due to path filters (5)
Docs/superpowers/plans/2026-06-30-standalone-mcp-docs-corpus-stage1-plan.mdis excluded by!docs/**Docs/superpowers/plans/2026-06-30-standalone-mcp-docs-url-acquisition-implementation-plan.mdis excluded by!docs/**Docs/superpowers/plans/2026-07-01-standalone-mcp-docs-stage3-server-mounting-plan.mdis excluded by!docs/**Docs/superpowers/specs/2026-06-30-standalone-mcp-docs-catalog-design.mdis excluded by!docs/**Docs/superpowers/specs/2026-06-30-standalone-mcp-docs-url-acquisition-design.mdis excluded by!docs/**
📒 Files selected for processing (57)
apps/mcp-unified/pyproject.tomlapps/mcp-unified/src/mcp_unified/docs/__init__.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/__init__.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/extract.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/fetcher.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/models.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/policy.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/resolver.pyapps/mcp-unified/src/mcp_unified/docs/acquisition/service.pyapps/mcp-unified/src/mcp_unified/docs/errors.pyapps/mcp-unified/src/mcp_unified/docs/importers/__init__.pyapps/mcp-unified/src/mcp_unified/docs/importers/base.pyapps/mcp-unified/src/mcp_unified/docs/importers/html.pyapps/mcp-unified/src/mcp_unified/docs/importers/local.pyapps/mcp-unified/src/mcp_unified/docs/importers/markdown.pyapps/mcp-unified/src/mcp_unified/docs/mcp_module.pyapps/mcp-unified/src/mcp_unified/docs/models.pyapps/mcp-unified/src/mcp_unified/docs/retrieval/__init__.pyapps/mcp-unified/src/mcp_unified/docs/retrieval/aliases.pyapps/mcp-unified/src/mcp_unified/docs/retrieval/context.pyapps/mcp-unified/src/mcp_unified/docs/retrieval/search.pyapps/mcp-unified/src/mcp_unified/docs/settings.pyapps/mcp-unified/src/mcp_unified/docs/standalone.pyapps/mcp-unified/src/mcp_unified/docs/store/__init__.pyapps/mcp-unified/src/mcp_unified/docs/store/schema.sqlapps/mcp-unified/src/mcp_unified/docs/store/sqlite.pybacklog/completed/task-12085 - Rebase-standalone-MCP-docs-PR-on-dev.mdbacklog/tasks/task-12071 - Design-standalone-MCP-document-corpus-and-Context7-compatible-RAG-tools.mdbacklog/tasks/task-12073 - Plan-standalone-MCP-docs-corpus-Stage-1-implementation.mdbacklog/tasks/task-12074 - Implement-standalone-MCP-docs-corpus-Stage-1.mdbacklog/tasks/task-12076 - Design-standalone-MCP-docs-Stage-2-URL-acquisition.mdbacklog/tasks/task-12077 - Plan-standalone-MCP-docs-Stage-2-URL-acquisition-implementation.mdbacklog/tasks/task-12078 - Implement-standalone-MCP-docs-Stage-2-URL-acquisition.mdbacklog/tasks/task-12079 - Plan-standalone-MCP-docs-Stage-3-server-mounting.mdbacklog/tasks/task-12080 - Implement-standalone-MCP-docs-Stage-3-server-mounting.mdpyproject.tomltldw_Server_API/Config_Files/mcp_modules.yamltldw_Server_API/app/core/MCP_unified/adapters/__init__.pytldw_Server_API/app/core/MCP_unified/adapters/docs/__init__.pytldw_Server_API/app/core/MCP_unified/adapters/docs/config.pytldw_Server_API/app/core/MCP_unified/modules/implementations/docs_module.pytldw_Server_API/app/core/MCP_unified/tests/test_dynamic_module_catalog.pytldw_Server_API/app/core/MCP_unified/tests/test_runtime_package_boundary.pytldw_Server_API/app/core/MCP_unified/tests/test_write_tools_validators.pytldw_Server_API/tests/MCP_unified/docs/test_docs_acquisition_extract.pytldw_Server_API/tests/MCP_unified/docs/test_docs_acquisition_fetcher.pytldw_Server_API/tests/MCP_unified/docs/test_docs_acquisition_policy.pytldw_Server_API/tests/MCP_unified/docs/test_docs_acquisition_service.pytldw_Server_API/tests/MCP_unified/docs/test_docs_host_adapter.pytldw_Server_API/tests/MCP_unified/docs/test_docs_import_boundaries.pytldw_Server_API/tests/MCP_unified/docs/test_docs_importers.pytldw_Server_API/tests/MCP_unified/docs/test_docs_mcp_provider.pytldw_Server_API/tests/MCP_unified/docs/test_docs_module_shim.pytldw_Server_API/tests/MCP_unified/docs/test_docs_retrieval_context.pytldw_Server_API/tests/MCP_unified/docs/test_docs_schema_store.pytldw_Server_API/tests/MCP_unified/docs/test_docs_settings.pytldw_Server_API/tests/MCP_unified/docs/test_docs_standalone_mount.py
There was a problem hiding this comment.
27 issues found across 62 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
4fbee43 to
0b0838c
Compare
There was a problem hiding this comment.
20 issues found across 64 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
a7b3a74 to
1b892d6
Compare
1b892d6 to
de99974
Compare
Summary
locked_down,local_first, andonline_capableprofiles.DocsModulesettings/scope creation through it.Test Plan
/Users/macbook-dev/Documents/GitHub/tldw_server2/.venv/bin/python -m pytest tldw_Server_API/tests/MCP_unified/docs -q --tb=short/Users/macbook-dev/Documents/GitHub/tldw_server2/.venv/bin/python -m pytest tldw_Server_API/tests/MCP_unified tldw_Server_API/app/core/MCP_unified/tests/test_dynamic_module_catalog.py -k "docs or write_tools or validator or dynamic_module_catalog" -q --tb=shortTrue False loaded_optional= []/Users/macbook-dev/Documents/GitHub/tldw_server2/.venv/bin/python -m black --check ...errors: [],results: []Merge Gate
This PR is AI-authored and intentionally draft. A human requester must add the required
Change summaryexplaining what changed and why these implementation choices were made before merge.