ArtifactService binary sanitization and child artifacts by mike-inkeep · Pull Request #2745 · inkeep/agents

mike-inkeep · 2026-03-18T14:48:47Z

No description provided.

…vice

changeset-bot · 2026-03-18T14:48:54Z

⚠️ No Changeset found

Latest commit: 32c091b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2026-03-18T14:49:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 18, 2026 2:49pm
agents-manage-ui	Ready	Preview, Comment	Mar 18, 2026 2:49pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
agents-docs	Skipped		Mar 18, 2026 2:49pm

claude

PR Review Summary

(5) Total Issues | Risk: Medium

🟠⚠️ Major (3) 🟠⚠️

🟠 1) ArtifactService.ts:1007-1059 Partial batch failure leaves inconsistent state

Issue: The for-loop in createBinaryChildArtifacts iterates through binary parts and calls upsertLedgerArtifact for each one sequentially. If the database insert fails mid-way through the batch (e.g., on the 3rd of 5 parts), some child artifacts are persisted while others are not. The error propagates up, but the refs Map will be incomplete.

Why: Consider an artifact with 5 images: if child artifact creation succeeds for images 1-2 but fails on image 3, the error propagates and saveArtifact fails. However, child artifacts 1-2 remain in the database as orphaned records. If retried, deduplication may behave unexpectedly. The parent artifact's data structure would be inconsistent - some binary parts with artifactRef, others without.

Fix: Options to consider:

Wrap in transaction for atomicity (if supported by the upsert function)
Use Promise.all to fail atomically if any fail:

const insertPromises = binaryParts.map(async (part) => {
  const hash = this.extractContentHashFromBlobUri(part.data) || this.fallbackHash(part.data);
  // ... build childArtifactId, check dedupe ...
  await upsertLedgerArtifact(runDbClient)({ ... });
  return { blobUri: part.data, ref: { artifactId, toolCallId } };
});
const results = await Promise.all(insertPromises);

Catch per-part errors and continue with partial success + explicit tracking

Refs:

createBinaryChildArtifacts loop

🟠 2) ArtifactService.ts:906-911 Binary child artifacts only created for fullData, not summaryData

Issue: Binary child artifacts are created by traversing sanitizedData only. If artifact.summaryData contains blob-backed binary parts that are NOT present in artifact.data, those binaries will not have child artifacts created. The attachBinaryArtifactRefs call on summaryData will then fail to find refs for those parts, leaving them without artifactRef.

Why: If a caller provides different binary content in summaryData vs data (e.g., a thumbnail in summary, full image in data), the summary's thumbnail would have an orphaned blob URI with no child artifact record.

Fix: Consider whether summaryData can contain different binary parts than data. If yes, collect binary parts from both:

const binaryReferences = await this.createBinaryChildArtifacts({
  parentArtifactId: artifact.artifactId,
  parentArtifactType: artifact.type,
  toolCallId: artifact.toolCallId,
  value: sanitizedSummaryData ? [sanitizedData, sanitizedSummaryData] : sanitizedData,
});

Or document that summaryData must be a subset of data for binary content.

Refs:

createBinaryChildArtifacts called only with sanitizedData

🟠 3) system Implicit contract change: artifact data now includes artifactRef for binary parts

Issue: The PR introduces an implicit data contract change. Artifact data structures (summaryData and fullData) will now contain artifactRef objects embedded within binary parts:

Before: { type: 'image', data: 'blob://...', mimeType: 'image/png' }
After: { type: 'image', data: 'blob://...', mimeType: 'image/png', artifactRef: { artifactId: 'bin_...', toolCallId: '...' } }

Why: This affects consumers retrieving artifacts via:

Chat API (Vercel AI SDK Data Stream)
getContextArtifacts, getArtifactSummary, getArtifactFull
Streaming via data-artifact events
UI components rendering artifact data

While additive fields are generally backward-compatible, consumers doing strict validation or field enumeration may be affected.

Fix: Consider documenting this contract change. If artifactRef is intended to be a stable API for clients to resolve binary artifacts, add it to validation schemas and document the shape.

Refs:

Inline Comments:

🟠 Major: ArtifactService.ts:997 Silent fallback when context is missing
🟠 Major: ArtifactService.ts:1163-1165 Regex requires literal . after hash

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: ArtifactService.ts:1148-1160 Type guard returns type: string instead of literal union

💭 Consider (1) 💭

💭 1) ArtifactService.ts:1044-1051 Child artifacts use metadata.parentArtifactId instead of existing derivedFrom column

Issue: The ledger_artifacts table already has a derivedFrom column designed for parent-child artifact relationships. This PR stores the relationship in metadata.parentArtifactId instead, creating two mechanisms for the same concept.

Why: The derivedFrom column is already wired into the DAL (maps from metadata.derivedFrom). Having two ways to express parent-child relationships may lead to inconsistent queries.

Fix: Consider using derivedFrom in metadata (or directly) for consistency:

metadata: {
  derivedFrom: params.parentArtifactId, // Uses existing column
  parentArtifactType: params.parentArtifactType,
  // ...
}

Refs:

derivedFrom in ledgerArtifacts DAL

Inline Comments:

💭 Consider: ArtifactService.test.ts:1052 Additional test coverage for edge cases

💡 APPROVE WITH SUGGESTIONS

Summary: This PR introduces a useful pattern for managing binary content in artifacts by extracting binaries into child artifacts with deduplicated storage. The main areas for improvement are: (1) partial batch failures can leave inconsistent state with orphaned child artifacts, (2) missing logging for silent fallbacks makes debugging difficult, and (3) the implicit contract change should be documented. The test coverage is a good start but could be expanded to cover edge cases. None of these are blockers, but addressing the error handling and logging would improve production reliability.

Discarded (9)

Location	Issue	Reason Discarded
`ArtifactService.ts:1007`	Sequential upserts for child artifacts	Intentional for deduplication logic; parallelization would break the dedupe Map accumulation
`ArtifactService.ts:1008-1014`	Deduplication scope limited to tool call	Working as designed - per-tool-call isolation provides clearer provenance
`ArtifactService.ts:1021`	Synthetic toolCallId pattern is new	The `:binary` suffix is actually good practice - distinguishes from real tool call IDs
`ArtifactService.ts:1030`	Child artifact type naming uses hyphens	Consistent with existing test data showing `binary-child` type
`ArtifactService.ts:1037`	`blobUri` vs `data` field name	Internal storage structure, not a consistency issue
`ArtifactService.ts:989-1061`	Extract named interfaces for types	Valid suggestion but INFO level - design preference
`ArtifactService.ts:1064-1096`	Return type looser than guard	Follows naturally from type guard fix - INFO level
`ArtifactService.ts:1168-1170`	fallbackHash uses hex encoding not SHA	Fallback path for edge cases, collision risk is minimal
`ArtifactService.ts:1172-1179`	`bin_` prefix is new naming pattern	Acceptable new pattern - clearly distinguishes binary children

Reviewers (6)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-errors`	4	1	0	1	2
`pr-review-standards`	3	1	0	1	1
`pr-review-architecture`	5	1	1	0	3
`pr-review-tests`	7	0	0	1	6
`pr-review-types`	3	0	0	1	2
`pr-review-consistency`	6	0	0	0	6
Total	28	3	1	4	20

claude · 2026-03-18T14:55:23Z

agents-api/src/domains/run/artifacts/ArtifactService.ts

+  }): Promise<Map<string, { artifactId: string; toolCallId: string }>> {
+    if (!this.context.taskId || !this.context.contextId) {
+      return new Map();
+    }


🟠 MAJOR: Silent fallback when context is missing

Issue: When taskId or contextId is missing, createBinaryChildArtifacts silently returns an empty Map with no logging. The caller has no indication that binary child artifact creation was skipped.

Why: This creates orphaned blob URIs in parent artifacts that have no corresponding child artifact records. Debugging this in production would require correlating blob storage contents with artifact records - difficult to diagnose without logs indicating the skip occurred.

Fix:

Suggested change

}

if (!this.context.taskId || !this.context.contextId) {

logger.warn(

{ parentArtifactId: params.parentArtifactId, hasTaskId: !!this.context.taskId, hasContextId: !!this.context.contextId },

'Skipping binary child artifact creation due to missing context'

);

return new Map();

}

Refs:

Existing warn logging pattern in saveArtifact

claude · 2026-03-18T14:55:30Z

agents-api/src/domains/run/artifacts/ArtifactService.ts

+  private extractContentHashFromBlobUri(blobUri: string): string | null {
+    const match = blobUri.match(/sha256-([a-f0-9]{16,64})\./i);
+    return match?.[1] || null;


🟠 MAJOR: Regex requires literal . after hash - fragile for URIs without extensions

Issue: The regex /sha256-([a-f0-9]{16,64})\./i requires a literal . after the hash. Blob URIs without an extension (e.g., blob://v1/.../sha256-abc123 with no trailing .ext) will fail to match, causing fallback to hex-encoding the entire URI.

Why: This could cause deduplication failures if the same binary is referenced via URIs with and without extensions. The fallback hash would differ from the extracted hash.

Fix:

Suggested change

private extractContentHashFromBlobUri(blobUri: string): string | null {

const match = blobUri.match(/sha256-([a-f0-9]{16,64})\./i);

return match?.[1] || null;

private extractContentHashFromBlobUri(blobUri: string): string | null {

const match = blobUri.match(/sha256-([a-f0-9]{16,64})(?:\.|$)/i);

return match?.[1] || null;

}

Refs:

buildStorageKey always adds extension - current code may be safe if all blob URIs have extensions, but defensive regex is safer

claude · 2026-03-18T14:55:36Z

agents-api/src/domains/run/artifacts/ArtifactService.ts

+  private isBlobBackedBinaryPart(
+    value: unknown
+  ): value is { type: string; data: string; mimeType?: string } {
+    if (!value || typeof value !== 'object' || Array.isArray(value)) {
+      return false;
+    }
+
+    const maybePart = value as Record<string, unknown>;
+    return (
+      (maybePart.type === 'image' || maybePart.type === 'file') &&
+      typeof maybePart.data === 'string' &&
+      isBlobUri(maybePart.data)
+    );


🟡 Minor: Type guard returns type: string instead of literal union

Issue: The type guard validates only 'image' | 'file' for the type field but returns a predicate with type: string. This loses type information that could enable exhaustive checking downstream.

Why: Downstream code consuming the narrowed type cannot safely narrow further since TypeScript sees type: string instead of the literal union.

Fix:

Suggested change

private isBlobBackedBinaryPart(

value: unknown

): value is { type: string; data: string; mimeType?: string } {

if (!value || typeof value !== 'object' || Array.isArray(value)) {

return false;

}

const maybePart = value as Record<string, unknown>;

return (

(maybePart.type === 'image' || maybePart.type === 'file') &&

typeof maybePart.data === 'string' &&

isBlobUri(maybePart.data)

);

private isBlobBackedBinaryPart(

value: unknown

): value is { type: 'image' | 'file'; data: string; mimeType?: string } {

if (!value || typeof value !== 'object' || Array.isArray(value)) {

return false;

}

const maybePart = value as Record<string, unknown>;

return (

(maybePart.type === 'image' || maybePart.type === 'file') &&

typeof maybePart.data === 'string' &&

isBlobUri(maybePart.data)

);

}

Refs:

InlineBinaryPart in artifact-binary-sanitizer.ts - uses type: 'image' | 'file' pattern

claude · 2026-03-18T14:55:57Z

agents-api/src/domains/run/artifacts/__tests__/ArtifactService.test.ts

+      expect(upsertFn).toHaveBeenCalledTimes(2);
+    });
+  });
 });


💭 Consider: Additional test coverage for edge cases

Issue: The new tests cover the happy path well but miss several edge cases:

Missing context - No test for when taskId/contextId is missing (early return path)

No binary parts - No test confirming correct behavior when artifact has no blob-backed binaries

summaryData path - No test when both data and summaryData are provided

type: 'file' - All tests use type: 'image'; the file type path is untested

Deeply nested parts - Tests only cover flat structures, not { outer: { inner: { images: [...] } } }

Why: These paths exist in the implementation and should have test coverage to prevent regressions.

Fix: Consider adding test cases for these scenarios. Example for missing context:

it('skips binary child artifact creation when taskId is missing', async () => { const serviceWithoutTask = new ArtifactService({ ...mockContext, taskId: undefined, }); // ... verify only parent artifact created });

Refs:

Lines 995-997 - early return when context missing

Lines 1156-1157 - file type check

itoqa · 2026-03-18T16:22:00Z

Ito Test Report ✅

14 test cases ran. 14 passed.

🔍 Verification focused on artifact persistence, dedupe/normalization behavior, edge-case handling, and adversarial route protections. The included cases all met expected behavior with representative evidence timestamps and screenshots.

✅ Passed (14)

Test Case	Summary	Timestamp
ROUTE-3	Conversation retrieval succeeded in both default and vercel-formatted checks, with stable role ordering and artifact messages represented as data-artifact without message-type regression.	49:23
LOGIC-1	ArtifactService dedupe test passed, confirming repeated identical hash entries are deduplicated within a single tool-call scope.	20:52
LOGIC-2	Source-backed validation confirmed dedupe scope includes toolCallId/parent scope and produces distinct dedupe keys and child IDs across calls for the same hash.	20:52
LOGIC-3	Normalization rule check passed and generated child artifact ID matched expected sanitized-scope plus hash24 format for special-character toolCallId input.	20:52
LOGIC-4	Conversation artifact replacement test passed, confirming artifact references are retained through formatting/replacement flow when multiple refs share one toolCallId.	20:52
EDGE-1	Short inline binary-like content remained inline and did not enter blob-sanitization replacement flow in targeted sanitizer validation.	27:26
EDGE-2	HTTP and data URI inputs stayed unchanged and were excluded from binary-child extraction paths during sanitizer validation.	27:26
EDGE-3	Missing task/context fallback completed without crash in targeted ArtifactService path validation.	27:26
EDGE-4	Navigation churn validation completed with successful post-stream conversation retrieval showing stable message count and preserved data-artifact payload, indicating recoverable integrity after refresh/back-forward actions.	49:23
EDGE-5	Rapid run-route submissions remained stable and source-level rapid-delta parser stress checks passed without service collapse.	27:27
ADV-1	Unauthorized and malformed-auth requests were denied without exposing artifact metadata or conversation data.	37:32
ADV-2	Cross-session replay attempt from Session B to Session A conversation was denied and did not expose foreign conversation content.	37:32
ADV-3	Malformed pseudo-base64 payload was rejected with controlled validation errors and no server-crash response pattern.	37:32
ADV-5	Identifier-injection payload was rejected at request validation and source-level artifact utility tests confirmed safe normalization behavior for identifier handling.	37:32

📋 View Recording

Screen Recording

Create binary child artifacts and attach artifact refs in ArtifactSer…

32c091b

…vice

claude bot reviewed Mar 18, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 18, 2026

This was referenced Mar 18, 2026

fix(agents-api): sanitize artifact binary data before persistence #2680

Closed

feat(images): enable multi-turn conversations about image context parts #2604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArtifactService binary sanitization and child artifacts#2745

ArtifactService binary sanitization and child artifacts#2745
mike-inkeep wants to merge 1 commit intostack/artifact_binary_sanitizerfrom
stack/artifact_service_binary_refs

mike-inkeep commented Mar 18, 2026

Uh oh!

changeset-bot bot commented Mar 18, 2026

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

itoqa bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    }
+    if (!this.context.taskId || !this.context.contextId) {
+      logger.warn(
+        { parentArtifactId: params.parentArtifactId, hasTaskId: !!this.context.taskId, hasContextId: !!this.context.contextId },
+        'Skipping binary child artifact creation due to missing context'
+      );
+      return new Map();
+    }

Conversation

mike-inkeep commented Mar 18, 2026

Uh oh!

changeset-bot bot commented Mar 18, 2026

⚠️ No Changeset found

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (3) 🟠⚠️

🟡 Minor (1) 🟡

💭 Consider (1) 💭

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

itoqa bot commented Mar 18, 2026

Ito Test Report ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant