Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 225 additions & 5 deletions agents-api/src/domains/run/artifacts/ArtifactService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import jmespath from 'jmespath';
import runDbClient from '../../../data/db/runDbClient';
import { getLogger } from '../../../logger';
import { toolSessionManager } from '../agents/services/ToolSessionManager';
import { isBlobUri } from '../services/blob-storage';
import { sanitizeArtifactBinaryData } from '../services/blob-storage/artifact-binary-sanitizer';
import { agentSessionManager } from '../session/AgentSession';
import {
type ExtendedJsonSchema,
Expand Down Expand Up @@ -886,11 +888,37 @@ export class ArtifactService {
metadata?: Record<string, any>;
toolCallId?: string;
}): Promise<void> {
// Use provided summaryData if available, otherwise default to artifact.data
let summaryData = artifact.summaryData || artifact.data;
let fullData = artifact.data;
const { tenantId, projectId } = this.context.executionContext;

const sanitizedData = (await sanitizeArtifactBinaryData(artifact.data, {
tenantId,
projectId,
artifactId: artifact.artifactId,
})) as Record<string, any>;
const sanitizedSummaryData = artifact.summaryData
? ((await sanitizeArtifactBinaryData(artifact.summaryData, {
tenantId,
projectId,
artifactId: artifact.artifactId,
})) as Record<string, any>)
: undefined;

const binaryReferences = await this.createBinaryChildArtifacts({
parentArtifactId: artifact.artifactId,
parentArtifactType: artifact.type,
toolCallId: artifact.toolCallId,
value: sanitizedData,
});

let fullData = this.attachBinaryArtifactRefs(sanitizedData, binaryReferences) as Record<
string,
any
>;
let summaryData = this.attachBinaryArtifactRefs(
sanitizedSummaryData || fullData,
binaryReferences
) as Record<string, any>;

if (this.context.artifactComponents) {
const artifactComponent = this.context.artifactComponents.find(
(ac) => ac.name === artifact.type
Expand All @@ -901,8 +929,8 @@ export class ArtifactService {
const previewSchema = extractPreviewFields(schema);
const fullSchema = extractFullFields(schema);

summaryData = this.filterBySchema(artifact.data, previewSchema);
fullData = this.filterBySchema(artifact.data, fullSchema);
summaryData = this.filterBySchema(summaryData, previewSchema);
fullData = this.filterBySchema(fullData, fullSchema);
} catch (error) {
logger.warn(
{
Expand Down Expand Up @@ -958,6 +986,198 @@ export class ArtifactService {
}
}

private async createBinaryChildArtifacts(params: {
parentArtifactId: string;
parentArtifactType: string;
toolCallId?: string;
value: unknown;
}): Promise<Map<string, { artifactId: string; toolCallId: string }>> {
if (!this.context.taskId || !this.context.contextId) {
return new Map();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Silent fallback when context is missing

Issue: When taskId or contextId is missing, createBinaryChildArtifacts silently returns an empty Map with no logging. The caller has no indication that binary child artifact creation was skipped.

Why: This creates orphaned blob URIs in parent artifacts that have no corresponding child artifact records. Debugging this in production would require correlating blob storage contents with artifact records - difficult to diagnose without logs indicating the skip occurred.

Fix:

Suggested change
}
if (!this.context.taskId || !this.context.contextId) {
logger.warn(
{ parentArtifactId: params.parentArtifactId, hasTaskId: !!this.context.taskId, hasContextId: !!this.context.contextId },
'Skipping binary child artifact creation due to missing context'
);
return new Map();
}

Refs:


const binaryParts = this.collectBlobBackedBinaryParts(params.value);
if (binaryParts.length === 0) {
return new Map();
}

const refs = new Map<string, { artifactId: string; toolCallId: string }>();
const dedupeByHash = new Map<string, { artifactId: string; toolCallId: string }>();

for (const part of binaryParts) {
const hash = this.extractContentHashFromBlobUri(part.data) || this.fallbackHash(part.data);
const dedupeKey = `${params.toolCallId || params.parentArtifactId}:${hash}`;
const existing = dedupeByHash.get(dedupeKey);
if (existing) {
refs.set(part.data, existing);
continue;
}

const childArtifactId = this.buildBinaryChildArtifactId(
params.toolCallId,
params.parentArtifactId,
hash
);
const childToolCallId = params.toolCallId || `${params.parentArtifactId}:binary`;

await upsertLedgerArtifact(runDbClient)({
scopes: this.context.executionContext,
contextId: this.context.contextId,
taskId: this.context.taskId,
toolCallId: params.toolCallId || null,
artifact: {
artifactId: childArtifactId,
type: `${params.parentArtifactType}-binary-child`,
name: `${params.parentArtifactType} binary ${hash.slice(0, 12)}`,
description: 'Binary payload extracted from parent artifact',
parts: [
{
kind: 'data',
data: {
blobUri: part.data,
mimeType: part.mimeType,
contentHash: hash,
binaryType: part.type,
},
},
],
metadata: {
parentArtifactId: params.parentArtifactId,
parentArtifactType: params.parentArtifactType,
toolCallId: params.toolCallId,
contentHash: hash,
mimeType: part.mimeType,
visibility: 'internal',
},
createdAt: new Date().toISOString(),
},
});

const reference = { artifactId: childArtifactId, toolCallId: childToolCallId };
dedupeByHash.set(dedupeKey, reference);
refs.set(part.data, reference);
}

return refs;
}

private collectBlobBackedBinaryParts(
value: unknown
): Array<{ type: string; data: string; mimeType?: string }> {
const inStack = new WeakSet<object>();
const collected: Array<{ type: string; data: string; mimeType?: string }> = [];

const visit = (current: unknown) => {
if (this.isBlobBackedBinaryPart(current)) {
collected.push(current);
return;
}

if (Array.isArray(current)) {
if (inStack.has(current)) return;
inStack.add(current);
for (const item of current) visit(item);
inStack.delete(current);
return;
}

if (current && typeof current === 'object') {
if (inStack.has(current as object)) return;
inStack.add(current as object);
for (const next of Object.values(current as Record<string, unknown>)) {
visit(next);
}
inStack.delete(current as object);
}
};

visit(value);
return collected;
}

private attachBinaryArtifactRefs(
value: unknown,
refs: Map<string, { artifactId: string; toolCallId: string }>
): unknown {
if (refs.size === 0) {
return value;
}

const inStack = new WeakSet<object>();

const visit = (current: unknown): unknown => {
if (this.isBlobBackedBinaryPart(current)) {
const ref = refs.get(current.data);
if (!ref) {
return current;
}
return {
...current,
artifactRef: {
artifactId: ref.artifactId,
toolCallId: ref.toolCallId,
},
};
}

if (Array.isArray(current)) {
if (inStack.has(current)) return '[Circular Reference]';
inStack.add(current);
const next = current.map((item) => visit(item));
inStack.delete(current);
return next;
}

if (current && typeof current === 'object') {
if (inStack.has(current as object)) return '[Circular Reference]';
inStack.add(current as object);
const next: Record<string, unknown> = {};
for (const [key, value] of Object.entries(current as Record<string, unknown>)) {
next[key] = visit(value);
}
inStack.delete(current as object);
return next;
}

return current;
};

return visit(value);
}

private isBlobBackedBinaryPart(
value: unknown
): value is { type: string; data: string; mimeType?: string } {
if (!value || typeof value !== 'object' || Array.isArray(value)) {
return false;
}

const maybePart = value as Record<string, unknown>;
return (
(maybePart.type === 'image' || maybePart.type === 'file') &&
typeof maybePart.data === 'string' &&
isBlobUri(maybePart.data)
);
Comment on lines +1148 to +1160
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Type guard returns type: string instead of literal union

Issue: The type guard validates only 'image' | 'file' for the type field but returns a predicate with type: string. This loses type information that could enable exhaustive checking downstream.

Why: Downstream code consuming the narrowed type cannot safely narrow further since TypeScript sees type: string instead of the literal union.

Fix:

Suggested change
private isBlobBackedBinaryPart(
value: unknown
): value is { type: string; data: string; mimeType?: string } {
if (!value || typeof value !== 'object' || Array.isArray(value)) {
return false;
}
const maybePart = value as Record<string, unknown>;
return (
(maybePart.type === 'image' || maybePart.type === 'file') &&
typeof maybePart.data === 'string' &&
isBlobUri(maybePart.data)
);
private isBlobBackedBinaryPart(
value: unknown
): value is { type: 'image' | 'file'; data: string; mimeType?: string } {
if (!value || typeof value !== 'object' || Array.isArray(value)) {
return false;
}
const maybePart = value as Record<string, unknown>;
return (
(maybePart.type === 'image' || maybePart.type === 'file') &&
typeof maybePart.data === 'string' &&
isBlobUri(maybePart.data)
);
}

Refs:

}

private extractContentHashFromBlobUri(blobUri: string): string | null {
const match = blobUri.match(/sha256-([a-f0-9]{16,64})\./i);
return match?.[1] || null;
Comment on lines +1163 to +1165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Regex requires literal . after hash - fragile for URIs without extensions

Issue: The regex /sha256-([a-f0-9]{16,64})\./i requires a literal . after the hash. Blob URIs without an extension (e.g., blob://v1/.../sha256-abc123 with no trailing .ext) will fail to match, causing fallback to hex-encoding the entire URI.

Why: This could cause deduplication failures if the same binary is referenced via URIs with and without extensions. The fallback hash would differ from the extracted hash.

Fix:

Suggested change
private extractContentHashFromBlobUri(blobUri: string): string | null {
const match = blobUri.match(/sha256-([a-f0-9]{16,64})\./i);
return match?.[1] || null;
private extractContentHashFromBlobUri(blobUri: string): string | null {
const match = blobUri.match(/sha256-([a-f0-9]{16,64})(?:\.|$)/i);
return match?.[1] || null;
}

Refs:

}

private fallbackHash(blobUri: string): string {
return Buffer.from(blobUri).toString('hex').slice(0, 24);
}

private buildBinaryChildArtifactId(
toolCallId: string | undefined,
parentArtifactId: string,
contentHash: string
): string {
const scope = (toolCallId || parentArtifactId).replace(/[^a-zA-Z0-9_-]/g, '_').slice(0, 64);
return `bin_${scope}_${contentHash.slice(0, 24)}`;
}

/**
* Clean up over-escaped strings that have been through multiple JSON serialization cycles
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ const {
upsertLedgerArtifactMock,
toolSessionManagerMock,
agentSessionManagerMock,
sanitizeArtifactBinaryDataMock,
} = vi.hoisted(() => ({
listTaskIdsByContextIdMock: vi.fn(),
getTaskMock: vi.fn(),
Expand All @@ -33,6 +34,7 @@ const {
setArtifactCache: vi.fn(),
getArtifactCache: vi.fn(),
},
sanitizeArtifactBinaryDataMock: vi.fn(async (value: unknown) => value),
}));

// Mock @inkeep/agents-core WITHOUT importOriginal to avoid loading the heavy module
Expand Down Expand Up @@ -62,6 +64,11 @@ vi.mock('../../session/AgentSession', () => ({
agentSessionManager: agentSessionManagerMock,
}));

vi.mock('../../services/blob-storage/artifact-binary-sanitizer', () => ({
sanitizeArtifactBinaryData: sanitizeArtifactBinaryDataMock,
stripBinaryDataForObservability: vi.fn((value: unknown) => value),
}));

// Mock runDbClient to prevent it from loading @inkeep/agents-core (path from test file to src/data/db)
vi.mock('../../../../data/db/runDbClient', () => ({
default: 'mock-run-db-client',
Expand Down Expand Up @@ -975,4 +982,71 @@ describe('ArtifactService', () => {
expect(result).toBeNull();
});
});

describe('saveArtifact binary child artifacts', () => {
it('creates child artifacts and injects structured refs for blob-backed binary parts', async () => {
sanitizeArtifactBinaryDataMock.mockResolvedValueOnce({
sections: [
{
type: 'image',
data: 'blob://v1/t_test-tenant/artifact-data/p_test-project/a_parent/sha256-abc123.png',
mimeType: 'image/png',
},
],
});

upsertLedgerArtifactMock.mockReturnValue(vi.fn().mockResolvedValue({ created: true }));

await artifactService.saveArtifact({
artifactId: 'parent-artifact',
name: 'Parent',
description: 'Parent artifact',
type: 'BinaryParent',
toolCallId: 'tool-call-1',
data: { original: 'value' },
});

const upsertFn = upsertLedgerArtifactMock.mock.results[0]?.value;
expect(upsertFn).toHaveBeenCalledTimes(2);
const childInsert = upsertFn.mock.calls[0][0];
const parentInsert = upsertFn.mock.calls[1][0];

expect(childInsert.artifact.artifactId).toContain('bin_tool-call-1');
expect(parentInsert.artifact.parts[0].data.full.sections[0].artifactRef).toEqual({
artifactId: expect.any(String),
toolCallId: 'tool-call-1',
});
});

it('dedupes repeated hashes within the same tool call', async () => {
sanitizeArtifactBinaryDataMock.mockResolvedValueOnce({
files: [
{
type: 'image',
data: 'blob://v1/t_test-tenant/artifact-data/p_test-project/a_parent/sha256-samehash.png',
mimeType: 'image/png',
},
{
type: 'image',
data: 'blob://v1/t_test-tenant/artifact-data/p_test-project/a_parent/sha256-samehash.png',
mimeType: 'image/png',
},
],
});

upsertLedgerArtifactMock.mockReturnValue(vi.fn().mockResolvedValue({ created: true }));

await artifactService.saveArtifact({
artifactId: 'parent-artifact',
name: 'Parent',
description: 'Parent artifact',
type: 'BinaryParent',
toolCallId: 'tool-call-2',
data: { original: 'value' },
});

const upsertFn = upsertLedgerArtifactMock.mock.results[0]?.value;
expect(upsertFn).toHaveBeenCalledTimes(2);
});
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Additional test coverage for edge cases

Issue: The new tests cover the happy path well but miss several edge cases:

  1. Missing context - No test for when taskId/contextId is missing (early return path)
  2. No binary parts - No test confirming correct behavior when artifact has no blob-backed binaries
  3. summaryData path - No test when both data and summaryData are provided
  4. type: 'file' - All tests use type: 'image'; the file type path is untested
  5. Deeply nested parts - Tests only cover flat structures, not { outer: { inner: { images: [...] } } }

Why: These paths exist in the implementation and should have test coverage to prevent regressions.

Fix: Consider adding test cases for these scenarios. Example for missing context:

it('skips binary child artifact creation when taskId is missing', async () => {
  const serviceWithoutTask = new ArtifactService({
    ...mockContext,
    taskId: undefined,
  });
  // ... verify only parent artifact created
});

Refs:

Loading