fix(import-markdown): P0 missing return + register test in CI#479
fix(import-markdown): P0 missing return + register test in CI#479jlin53882 wants to merge 12 commits intoCortexReach:masterfrom
Conversation
Add `memory-pro import-markdown` command to migrate existing Markdown memories (MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall. This addresses Issue CortexReach#344 by providing a migration path from the Markdown layer to the plugin memory layer.
…fig options + tests ## 實作改善(相對於原本的 PR CortexReach#426) ### 新增 CLI 選項 - --dedup:啟用 scope-aware exact match 去重(避免重複匯入) - --min-text-length <n>:設定最短文字長度門檻(預設 5) - --importance <n>:設定匯入記憶的 importance 值(預設 0.7) ### Bug 修復 - UTF-8 BOM 處理:讀檔後主動移除 \ufeFF prefix - CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF - Bullet 格式擴展:從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種 ### 新增測試 - test/import-markdown/import-markdown.test.mjs:完整單元測試 - BOM handling - CRLF normalization - Extended bullet formats (dash/star/plus) - minTextLength 參數 - importance 參數 - Dedup logic(scope-aware exact match) - Dry-run mode - Continue on error ### 分析文件 - test/import-markdown/ANALYSIS.md:完整分析報告 - 效益分析(真實檔案 655 筆記錄實測) - 3 個程式碼缺口分析 - 建議的 5 個新 config 欄位 - 功能條列式說明 - test/import-markdown/recall-benchmark.py:實際 LanceDB 查詢對比腳本 - 實測結果:7/8 個關鍵字在 Markdown 有但 LanceDB 找不到 - 證明 import-markdown 的實際價值 ## 實測效果(真實記憶檔案) - James 的 workspace:MEMORY.md(20 筆)+ 30 個 daily notes(633 筆)= 653 筆記錄 - 無 dedup:每次執行浪費 50%(重複匯入) - 有 dedup:第二次執行 100% skip,節省 644 次 embedder API 呼叫 - 關鍵字對比:7/8 個測試關鍵字在 Markdown 有、LanceDB 無 ## 建議新增的 Config(共 5 項,預設值 = 現在行為,向下相容) - importMarkdown.dedup: boolean = false - importMarkdown.defaultScope: string = global - importMarkdown.minTextLength: number = 5 - importMarkdown.importanceDefault: number = 0.7 - importMarkdown.workspaceFilter: string[] = [] Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)
P1 fixes:
- embedQuery -> embedPassage (lines 1001, 1171): imported memory content
is passage/document, not a query. Using embedQuery with asymmetric
providers (e.g. Jina) causes query-query comparison at recall time,
degrading retrieval quality.
- metadata: JSON.stringify the importedFrom object (line 1178):
MemoryEntry.metadata is typed as string in store.ts; passing a plain
object silently fails or produces unparseable data.
Minor fixes:
- workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes)
- Hoist await import('node:fs/promises') out of loops: single import at
handler level replaces repeated per-iteration dynamic imports
Ref: CortexReach/pull/426
The const fsPromises declaration was inside the try block, making it scoped to that block only. Subsequent fsPromises.stat() calls in MEMORY.md and memory/ processing code were failing with 'fsPromises is not defined'. Move declaration to handler scope.
Scans the flat \workspace/memory/\ directory (directly under workspace root, not inside any workspace subdirectory) and imports entries with scope='memory'. This supports the actual OpenClaw structure where memory files live directly in workspace/memory/.
Before scanning, read openclaw.json agents list to find the agent whose workspace path matches the current workspaceDir. Use that agent's id as workspaceScope for flat memory/ entries instead of defaulting to 'memory'. Falls back to 'shared' when no matching agent is found (e.g. shared workspace with no dedicated agent).
Must fix: - Flat memory scan: move before the mdFiles.length===0 early return so it is always reachable (not just when nested workspaces are empty) - Tests: runImportMarkdown now uses embedPassage (not embedQuery) and JSON.stringify(metadata) to match production. Added embedPassage mock. - Tests: setupWorkspace now creates files at workspace/<name>/ to match the actual path structure runImportMarkdown expects Worth considering: - Flat memory scan now skips when workspaceGlob is set, avoiding accidental root flat memory import when user specifies --workspace - Removed dev artifacts: ANALYSIS.md and recall-benchmark.py contained personal absolute paths and are not suitable for repo commit
Before: --dry-run skipped dedup check entirely, so --dry-run --dedup would overcount imports (items counted as imported even if dedup would skip them). After: dedup check runs regardless of dry-run mode. In dry-run, items that would be skipped by dedup are counted as skipped, not imported. Restores the dry-run console log message.
Must fix: - Source scopes discovered but discarded: scanner now falls back to per-file discovered scope instead of collapsing all workspaces into "global". Prevents cross-workspace leakage and incorrect dedup across workspaces. - Scanner only descended one level: now also scans workspace/agents/<id>/ for nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md). Minor fixes: - NaN guardrails: --min-text-length and --importance now use clampInt and Number.isFinite to prevent invalid values from silently passing. - Tests reimplement import logic: runImportMarkdown is now exported from cli.ts and tests call the production handler directly instead of a standalone copy. Prevents logic drift between tests and production. Refs: PR CortexReach#426 review feedback
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 458bbff537
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| * | ||
| * Run: node --experimental-vm-modules node_modules/.bin/jest test/import-markdown.test.mjs | ||
| */ | ||
| import { jest } from "@jest/globals"; |
There was a problem hiding this comment.
Remove Jest-only import from node --test suite
The new test is wired into package.json via node --test, but this file imports @jest/globals and uses Jest APIs. This repository does not declare Jest in devDependencies, so the test run fails immediately with ERR_MODULE_NOT_FOUND before any assertions execute (I verified with node --test test/import-markdown/import-markdown.test.mjs). Registering this file in CI therefore breaks npm test until the test is rewritten to node:test APIs or Jest is added and invoked consistently.
Useful? React with 👍 / 👎.
|
|
||
| beforeAll(async () => { | ||
| // We test the core logic directly instead of via CLI to avoid complex setup | ||
| const mod = await import("../cli.ts"); |
There was a problem hiding this comment.
Correct CLI module path used by import-markdown test
The test loads the production CLI with import("../cli.ts"), but from test/import-markdown/ that resolves to test/cli.ts (which does not exist), not the repository root cli.ts. Once the runner issue above is fixed, this path will throw ERR_MODULE_NOT_FOUND and prevent the suite from exercising runImportMarkdown at all.
Useful? React with 👍 / 👎.
AliceLJY
left a comment
There was a problem hiding this comment.
这版还不能合:你把 test/import-markdown/import-markdown.test.mjs 接进了 npm test,但该文件当前直接 import { jest } from "@jest/globals",仓库里并没有这个依赖。我本地按 PR 描述执行了 node --test test/import-markdown/import-markdown.test.mjs,会直接报 ERR_MODULE_NOT_FOUND: Cannot find package "@jest/globals",所以现在这条 PR 会把主测试入口直接搞红。先把测试改成当前仓库实际使用的 runner / 依赖体系,或者补齐依赖并说明为什么要引入 Jest。
|
PR #482 是同一分支 rebase 到当前 master 的版本,已无冲突。请确认后关闭本 PR,或评论说明保留哪一个。 |
|
Closing — superseded by PR #482 per author's request. |
Summary
Two P0 fixes for the import-markdown CLI command:
Fix 1: Missing return statement (cli.ts)
Added the missing
eturn { imported, skipped, foundFiles } after the non-dry-run console.log block. Without this return, callers would always receive undefined instead of the import result object.
Fix 2: Register import-markdown test in CI (package.json)
Added
ode --test test/import-markdown/import-markdown.test.mjs to the est script so it runs in CI.
Changes
Testing
ode --test test/import-markdown/import-markdown.test.mjs — note: test file has pre-existing @jest/globals dependency issue (unrelated to this PR)