Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,566 changes: 1 addition & 1,565 deletions multimodal/pnpm-lock.yaml

Large diffs are not rendered by default.

175 changes: 173 additions & 2 deletions multimodal/tarko/agent-snapshot/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,183 @@
# @tarko/agent-snapshot

A snapshot-based agent test framework for `@tarko/agent` based Agents",
A snapshot-based testing framework for `@tarko/agent` that captures and replays agent interactions for deterministic testing.

## Features

- **Snapshot Generation**: Record real agent interactions with LLMs and tools
- **Deterministic Replay**: Test agents using captured snapshots without external dependencies
- **Flexible Verification**: Configurable verification of LLM requests, event streams, and tool calls
- **Smart Normalization**: Automatic normalization of dynamic values (timestamps, IDs) for consistent comparisons

## Installation

```bash
npm install @tarko/agent-snapshot
```

## Usage
## Quick Start

### Basic Usage

```typescript
import { AgentSnapshot } from '@tarko/agent-snapshot';
import { Agent } from '@tarko/agent';

// Create your agent
const agent = new Agent({
// agent configuration
});

// Create snapshot tester
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './test-snapshots/my-test'
});

// Generate snapshot (runs with real LLM)
const input = "What's the weather like?";
await snapshot.generate(input);

// Test against snapshot (uses mocked responses)
const result = await snapshot.test(input);
console.log('Test passed!', result.response);
```

### Advanced Configuration

```typescript
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './snapshots/complex-test',
normalizerConfig: {
fieldsToNormalize: [
{ pattern: /requestId/, replacement: '<<REQUEST_ID>>' },
{ pattern: 'timestamp', replacement: '<<TIMESTAMP>>' }
],
fieldsToIgnore: ['debugInfo']
},
verification: {
verifyLLMRequests: true,
verifyEventStreams: true,
verifyToolCalls: false // Skip tool call verification
}
});
```

### Batch Testing with AgentSnapshotRunner

```typescript
import { AgentSnapshotRunner } from '@tarko/agent-snapshot';

const runner = new AgentSnapshotRunner([
{
name: 'weather-query',
path: './test-cases/weather.ts',
snapshotPath: './snapshots/weather'
},
{
name: 'complex-reasoning',
path: './test-cases/reasoning.ts',
snapshotPath: './snapshots/reasoning'
}
]);

// Generate all snapshots
await runner.generateAll();

// Test all snapshots
const results = await runner.testAll();
```

## API Reference

### AgentSnapshot

Main class for snapshot-based testing.

#### Constructor

```typescript
new AgentSnapshot(agent: Agent, options: AgentSnapshotOptions)
```

#### Methods

- `generate(input)` - Generate snapshot with real LLM calls
- `test(input, config?)` - Test against existing snapshot
- `updateNormalizerConfig(config)` - Update normalization settings

### AgentSnapshotOptions

```typescript
interface AgentSnapshotOptions {
snapshotPath: string; // Directory for snapshots
snapshotName?: string; // Custom snapshot name
normalizerConfig?: AgentNormalizerConfig;
verification?: {
verifyLLMRequests?: boolean; // Default: true
verifyEventStreams?: boolean; // Default: true
verifyToolCalls?: boolean; // Default: true
};
}
```

## Best Practices

1. **Organize snapshots by feature**: Use descriptive paths like `./snapshots/feature/scenario`
2. **Version control snapshots**: Include snapshot files in your repository
3. **Update snapshots intentionally**: Use update mode only when agent behavior legitimately changes
4. **Configure normalization**: Normalize dynamic values that shouldn't affect test outcomes
5. **Selective verification**: Disable verification for non-deterministic components when needed

## Snapshot Structure

```
snapshots/
└── my-test/
β”œβ”€β”€ event-stream.jsonl # Final event stream state
β”œβ”€β”€ loop-1/
β”‚ β”œβ”€β”€ llm-request.jsonl # LLM request for loop 1
β”‚ β”œβ”€β”€ llm-response.jsonl # LLM response for loop 1
β”‚ β”œβ”€β”€ event-stream.jsonl # Event stream at loop 1
β”‚ └── tool-calls.jsonl # Tool calls in loop 1
└── loop-2/
└── ...
```

## Integration with Testing Frameworks

### Vitest Example

```typescript
import { describe, it, expect } from 'vitest';
import { AgentSnapshot } from '@tarko/agent-snapshot';

describe('Agent Tests', () => {
it('should handle weather queries', async () => {
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './snapshots/weather-test'
});

const result = await snapshot.test('What is the weather in Tokyo?');
expect(result.meta.loopCount).toBe(2);
expect(result.response).toMatchSnapshot();
});
});
```

## Troubleshooting

### Common Issues

1. **Snapshot mismatch**: Check `.actual.jsonl` files for differences
2. **Dynamic values**: Configure normalizer to handle timestamps, IDs, etc.
3. **Tool call variations**: Consider disabling tool call verification for non-deterministic tools
4. **Path issues**: Ensure snapshot directories exist and are writable

### Debug Mode

Enable detailed logging:

```typescript
import { logger } from '@tarko/agent-snapshot/utils';
logger.setLevel('debug');
```
1 change: 0 additions & 1 deletion multimodal/tarko/agent-snapshot/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
"@tarko/agent-interface": "workspace:*",
"@agent-infra/logger": "0.0.2-beta.2",
"fast-json-stable-stringify": "^2.1.0",
"snapshot-diff": "0.10.0",
"chalk": "^4.1.2"
},
"devDependencies": {
Expand Down
Loading
Loading