Skip to content

fix: preserve real prefix when middle ID segment is a reserved word#513

Open
JOhnsonKC201 wants to merge 4 commits into
Egonex-AI:mainfrom
JOhnsonKC201:fix/strip-valid-prefix-middle-reserved-word
Open

fix: preserve real prefix when middle ID segment is a reserved word#513
JOhnsonKC201 wants to merge 4 commits into
Egonex-AI:mainfrom
JOhnsonKC201:fix/strip-valid-prefix-middle-reserved-word

Conversation

@JOhnsonKC201

Copy link
Copy Markdown

Problem

stripToValidPrefix in analyzer/normalize-graph.ts collapses any node ID whose second segment happens to be a valid prefix, treating it as a double-prefix duplicate. This corrupts IDs where a reserved word legitimately appears as a middle path segment.

For example, endpoint:service:getUser is parsed as:

  • outer segment endpoint (valid prefix) ✓
  • next segment service (also a valid prefix) → wrongly assumed to be a duplicate prefix

…so the real endpoint prefix is dropped and the function returns { prefix: "service", path: "getUser" }, yielding service:getUser. This:

  • changes the node type (endpoint → service),
  • breaks edge references that point at the original ID, and
  • violates idempotency — normalizing an already-normalized ID mutates it.

Fix

Only collapse a true same-prefix duplicate (e.g. file:file:src/foo.ts) by requiring the inner segment to equal the outer prefix:

// before
if (innerColonIdx > 0 && VALID_PREFIXES.has(rest.slice(0, innerColonIdx))) {

// after
if (innerColonIdx > 0 && rest.slice(0, innerColonIdx) === segment) {

A different reserved word in the middle is a legitimate path segment and is preserved. The genuine file:file:... double-prefix case still collapses as before.

Tests

Added two regression tests to normalize-graph.test.ts:

  • endpoint:service:getUser is preserved unchanged (was previously corrupted to service:getUser).
  • normalization is idempotent for IDs with a reserved-word middle segment.

pnpm --filter @understand-anything/core test755 passing (including the existing file:file: double-prefix test, which still passes).

stripToValidPrefix collapsed any ID whose second segment was a valid
prefix, treating e.g. "endpoint:service:x" as a double-prefix and
returning "service:x". This dropped the real outer prefix, corrupting
the node type and breaking edge references and idempotency.

Only collapse a true same-prefix duplicate (e.g. "file:file:...") by
requiring the inner segment to equal the outer prefix. A different
reserved word in the middle is a legitimate path segment and is kept.

Adds regression tests covering the middle-reserved-word case and
idempotency.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfe10eed8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const rest = remaining.slice(colonIdx + 1);
const innerColonIdx = rest.indexOf(":");
if (innerColonIdx > 0 && VALID_PREFIXES.has(rest.slice(0, innerColonIdx))) {
if (innerColonIdx > 0 && rest.slice(0, innerColonIdx) === segment) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use expected type when collapsing prefixed IDs

When an LLM emits a project-prefixed ID and the project name is also a reserved prefix (for example a file node with service:file:src/foo.ts, or any bad outer valid prefix before the expected prefix), this condition no longer recurses because the inner prefix differs from the outer one. normalizeNodeId then returns service:file:src/foo.ts for a node whose type is file, so the graph no longer uses the canonical type:path ID and edges that reference the canonical file:src/foo.ts form are dropped as dangling. The duplicate-prefix decision needs the expected node prefix to distinguish this case from legitimate middle path segments like endpoint:service:x.

Useful? React with 👍 / 👎.

The previous fix collapsed a duplicate prefix only when the inner
segment matched the outer one (`inner === segment`). That dropped a
spurious project-name prefix only when it was an exact duplicate, so an
ID like `service:file:src/foo.ts` for a `file` node — where the project
name happens to collide with the reserved word `service` — kept the
outer `service:` prefix instead of resolving to the canonical
`file:src/foo.ts`. Edges referencing `file:src/foo.ts` then dangled.

Thread the node's expected prefix into stripToValidPrefix and collapse
when the inner segment is either the same reserved word (true duplicate)
or the expected prefix (spurious reserved-word project prefix). A
different reserved word that is not the expected prefix
(`endpoint:service:x`) is still preserved as a real path segment.

Adds two regression tests covering the strip + idempotency cases.
@JOhnsonKC201

Copy link
Copy Markdown
Author

Confirmed the codex P2 and pushed a follow-up. The earlier inner === segment check kept a spurious reserved-word project prefix — e.g. service:file:src/foo.ts for a file node stayed as-is instead of resolving to the canonical file:src/foo.ts, which would dangle edges referencing the canonical ID.

Fix (6d54111): stripToValidPrefix now takes the node's expected prefix and collapses the outer segment when the inner one is either the same reserved word (true duplicate) or the expected prefix (spurious reserved-word project prefix). A different reserved word that isn't the expected prefix (endpoint:service:x) is still preserved as a real path segment. Added two regression tests (strip + idempotency); 757/757 core tests pass.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d54111a8b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const innerSegment = innerColonIdx > 0 ? rest.slice(0, innerColonIdx) : "";
if (
innerColonIdx > 0 &&
(innerSegment === segment || innerSegment === expectedPrefix)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle reserved-word project prefixes in edge fallback

When normalizeBatchOutput falls back to normalizing an edge endpoint that is not present in idMap, it infers the type from the malformed endpoint itself. For a project named like a reserved prefix, e.g. an edge endpoint service:file:src/foo.ts pointing to an existing canonical node file:src/foo.ts, inferTypeFromId supplies service, so this check does not strip the outer service segment and the edge remains dangling and is dropped. This regresses the cross-variant edge repair path for the same reserved-word project-prefix case the node normalization now handles.

Useful? React with 👍 / 👎.

The node-level fix did not cover normalizeBatchOutput's edge-repair
fallback. When an edge endpoint isn't found in idMap, the fallback infers
its type from the endpoint's own prefix via inferTypeFromId. For a
reserved-word project prefix (e.g. edge `service:file:src/foo.ts`
pointing at node `file:src/foo.ts`), that yields type `service`, so
normalizeNodeId keeps the spurious `service:` segment, the endpoint never
matches the canonical node, and the edge is dropped as dangling.

Add resolveEdgeEndpoint: try the prefix-inferred type first (unchanged
common case), then each subsequent leading reserved-word segment as a
candidate type, returning the first that resolves to an existing node —
and the original id when none do. Add a regression test asserting the
edge is repaired rather than dropped.

758/758 core tests pass.
@JOhnsonKC201

Copy link
Copy Markdown
Author

Addressed the edge-path P2 from the last review. The node-level fix didn't cover normalizeBatchOutput's edge-repair fallback: for an endpoint like service:file:src/foo.ts pointing at canonical node file:src/foo.ts, inferTypeFromId read the spurious service as the type, so the strip never fired and the edge dropped as dangling.

Fix (0a0ffd2): added resolveEdgeEndpoint — tries the prefix-inferred type first (common case unchanged), then each subsequent leading reserved-word segment as a candidate type, returning the first that resolves to an existing node (and the original id when none do). Added a regression test asserting the edge is repaired rather than dropped; 758/758 core tests pass.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a0ffd233c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

for (const type of candidateTypes) {
const normalized = normalizeNodeId(id, { type });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize from the candidate prefix segment

When an edge endpoint has more than one reserved prefix before the real node prefix, e.g. service:endpoint:file:src/foo.ts pointing at an existing file:src/foo.ts, candidateTypes includes file but this loop still calls normalizeNodeId on the full original ID. For the file candidate, stripToValidPrefix sees service followed by endpoint and preserves service:endpoint:..., so none of the later candidates can resolve and the edge is dropped as dangling; the previous fallback would collapse this chain. The repair needs to normalize from the candidate segment (or otherwise skip earlier reserved segments) rather than always from id.

Useful? React with 👍 / 👎.

resolveEdgeEndpoint tried every candidate type against the full edge
endpoint id. For a chain of reserved prefixes before the real one (e.g.
`service:endpoint:file:src/foo.ts` pointing at node `file:src/foo.ts`),
stripToValidPrefix can't collapse the run for the `file` candidate —
`service` then `endpoint` are preserved — so no candidate resolves and
the edge is dropped as dangling.

Pair each candidate type with the id suffix that begins at its prefix
segment, and normalize from that suffix instead of the full id. The
inferred-type/full-id attempt is kept first so the common and single
reserved-word prefix cases are unchanged. Add a regression test for the
multi-prefix chain.

759/759 core tests pass.
@JOhnsonKC201

Copy link
Copy Markdown
Author

Addressed the third codex P2 ("Normalize from the candidate prefix segment"). The edge-repair fallback normalized every candidate type against the full endpoint id, so a chain of reserved prefixes before the real one — e.g. service:endpoint:file:src/foo.ts pointing at file:src/foo.ts — couldn't be collapsed for the file candidate and the edge was dropped as dangling.

resolveEdgeEndpoint now pairs each candidate type with the id suffix that begins at its prefix segment and normalizes from that suffix. The inferred-type/full-id attempt is still tried first, so the common case and the earlier single reserved-word prefix case are unchanged. Added a regression test for the multi-prefix chain; 759/759 core tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant