fix: preserve real prefix when middle ID segment is a reserved word#513
fix: preserve real prefix when middle ID segment is a reserved word#513JOhnsonKC201 wants to merge 4 commits into
Conversation
stripToValidPrefix collapsed any ID whose second segment was a valid prefix, treating e.g. "endpoint:service:x" as a double-prefix and returning "service:x". This dropped the real outer prefix, corrupting the node type and breaking edge references and idempotency. Only collapse a true same-prefix duplicate (e.g. "file:file:...") by requiring the inner segment to equal the outer prefix. A different reserved word in the middle is a legitimate path segment and is kept. Adds regression tests covering the middle-reserved-word case and idempotency.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bfe10eed8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const rest = remaining.slice(colonIdx + 1); | ||
| const innerColonIdx = rest.indexOf(":"); | ||
| if (innerColonIdx > 0 && VALID_PREFIXES.has(rest.slice(0, innerColonIdx))) { | ||
| if (innerColonIdx > 0 && rest.slice(0, innerColonIdx) === segment) { |
There was a problem hiding this comment.
Use expected type when collapsing prefixed IDs
When an LLM emits a project-prefixed ID and the project name is also a reserved prefix (for example a file node with service:file:src/foo.ts, or any bad outer valid prefix before the expected prefix), this condition no longer recurses because the inner prefix differs from the outer one. normalizeNodeId then returns service:file:src/foo.ts for a node whose type is file, so the graph no longer uses the canonical type:path ID and edges that reference the canonical file:src/foo.ts form are dropped as dangling. The duplicate-prefix decision needs the expected node prefix to distinguish this case from legitimate middle path segments like endpoint:service:x.
Useful? React with 👍 / 👎.
The previous fix collapsed a duplicate prefix only when the inner segment matched the outer one (`inner === segment`). That dropped a spurious project-name prefix only when it was an exact duplicate, so an ID like `service:file:src/foo.ts` for a `file` node — where the project name happens to collide with the reserved word `service` — kept the outer `service:` prefix instead of resolving to the canonical `file:src/foo.ts`. Edges referencing `file:src/foo.ts` then dangled. Thread the node's expected prefix into stripToValidPrefix and collapse when the inner segment is either the same reserved word (true duplicate) or the expected prefix (spurious reserved-word project prefix). A different reserved word that is not the expected prefix (`endpoint:service:x`) is still preserved as a real path segment. Adds two regression tests covering the strip + idempotency cases.
|
Confirmed the codex P2 and pushed a follow-up. The earlier Fix (6d54111): @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d54111a8b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const innerSegment = innerColonIdx > 0 ? rest.slice(0, innerColonIdx) : ""; | ||
| if ( | ||
| innerColonIdx > 0 && | ||
| (innerSegment === segment || innerSegment === expectedPrefix) |
There was a problem hiding this comment.
Handle reserved-word project prefixes in edge fallback
When normalizeBatchOutput falls back to normalizing an edge endpoint that is not present in idMap, it infers the type from the malformed endpoint itself. For a project named like a reserved prefix, e.g. an edge endpoint service:file:src/foo.ts pointing to an existing canonical node file:src/foo.ts, inferTypeFromId supplies service, so this check does not strip the outer service segment and the edge remains dangling and is dropped. This regresses the cross-variant edge repair path for the same reserved-word project-prefix case the node normalization now handles.
Useful? React with 👍 / 👎.
The node-level fix did not cover normalizeBatchOutput's edge-repair fallback. When an edge endpoint isn't found in idMap, the fallback infers its type from the endpoint's own prefix via inferTypeFromId. For a reserved-word project prefix (e.g. edge `service:file:src/foo.ts` pointing at node `file:src/foo.ts`), that yields type `service`, so normalizeNodeId keeps the spurious `service:` segment, the endpoint never matches the canonical node, and the edge is dropped as dangling. Add resolveEdgeEndpoint: try the prefix-inferred type first (unchanged common case), then each subsequent leading reserved-word segment as a candidate type, returning the first that resolves to an existing node — and the original id when none do. Add a regression test asserting the edge is repaired rather than dropped. 758/758 core tests pass.
|
Addressed the edge-path P2 from the last review. The node-level fix didn't cover Fix (0a0ffd2): added @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a0ffd233c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| for (const type of candidateTypes) { | ||
| const normalized = normalizeNodeId(id, { type }); |
There was a problem hiding this comment.
Normalize from the candidate prefix segment
When an edge endpoint has more than one reserved prefix before the real node prefix, e.g. service:endpoint:file:src/foo.ts pointing at an existing file:src/foo.ts, candidateTypes includes file but this loop still calls normalizeNodeId on the full original ID. For the file candidate, stripToValidPrefix sees service followed by endpoint and preserves service:endpoint:..., so none of the later candidates can resolve and the edge is dropped as dangling; the previous fallback would collapse this chain. The repair needs to normalize from the candidate segment (or otherwise skip earlier reserved segments) rather than always from id.
Useful? React with 👍 / 👎.
resolveEdgeEndpoint tried every candidate type against the full edge endpoint id. For a chain of reserved prefixes before the real one (e.g. `service:endpoint:file:src/foo.ts` pointing at node `file:src/foo.ts`), stripToValidPrefix can't collapse the run for the `file` candidate — `service` then `endpoint` are preserved — so no candidate resolves and the edge is dropped as dangling. Pair each candidate type with the id suffix that begins at its prefix segment, and normalize from that suffix instead of the full id. The inferred-type/full-id attempt is kept first so the common and single reserved-word prefix cases are unchanged. Add a regression test for the multi-prefix chain. 759/759 core tests pass.
|
Addressed the third codex P2 ("Normalize from the candidate prefix segment"). The edge-repair fallback normalized every candidate type against the full endpoint id, so a chain of reserved prefixes before the real one — e.g.
|
Problem
stripToValidPrefixinanalyzer/normalize-graph.tscollapses any node ID whose second segment happens to be a valid prefix, treating it as a double-prefix duplicate. This corrupts IDs where a reserved word legitimately appears as a middle path segment.For example,
endpoint:service:getUseris parsed as:endpoint(valid prefix) ✓service(also a valid prefix) → wrongly assumed to be a duplicate prefix…so the real
endpointprefix is dropped and the function returns{ prefix: "service", path: "getUser" }, yieldingservice:getUser. This:Fix
Only collapse a true same-prefix duplicate (e.g.
file:file:src/foo.ts) by requiring the inner segment to equal the outer prefix:A different reserved word in the middle is a legitimate path segment and is preserved. The genuine
file:file:...double-prefix case still collapses as before.Tests
Added two regression tests to
normalize-graph.test.ts:endpoint:service:getUseris preserved unchanged (was previously corrupted toservice:getUser).pnpm --filter @understand-anything/core test— 755 passing (including the existingfile:file:double-prefix test, which still passes).