diff --git a/docs/session-api-design.md b/docs/session-api-design.md new file mode 100644 index 0000000..24c96f4 --- /dev/null +++ b/docs/session-api-design.md @@ -0,0 +1,456 @@ +# Session API Design + +## Overview + +Agent Boss currently manages agent sessions through tmux and Paude containers, with messaging via HTTP API calls and tmux commands. This design proposes a unified Session API that abstracts over different session providers while maintaining the existing coordination and messaging capabilities. + +## Current State Analysis + +### Existing Implementations + +| Implementation | Session Type | Lifecycle | Messaging | Status | +|---|---|---|---|---| +| **tmux** | Local terminal sessions | `tmux new-session` / `tmux kill-session` | `tmux send-keys` | `tmux list-sessions` | +| **Paude** | Containerized sessions | `podman run` / `podman rm` | HTTP + container exec | `podman ps` | +| **Ambient API** | Cloud-hosted sessions | API calls to platform | HTTP API | API status endpoints | + +### Current Capabilities + +- ✅ **Agent registration**: Agents POST status to `/spaces/{space}/agent/{name}` +- ✅ **Broadcast messaging**: POST to `/spaces/{space}/broadcast` reaches all agents +- ✅ **Tmux integration**: Dashboard shows tmux session status via SSE +- ✅ **Paude deployment**: `scripts/boss.sh` manages container lifecycle +- ❌ **Unified session management**: No abstraction over different providers +- ❌ **Session API**: No REST endpoints for session operations +- ❌ **Provider pluggability**: Hard-coded implementation switching + +## Design Goals + +1. **Provider Abstraction**: Support tmux, Paude, Ambient, and future providers through common interface +2. **Backward Compatibility**: Existing agent coordination and messaging continues to work unchanged +3. **REST API**: Standard HTTP operations for session management +4. **Provider Configuration**: Declarative configuration for different session providers +5. **Lifecycle Management**: Consistent session create/start/stop/delete across providers +6. **Status Normalization**: Common status representation across different session types +7. **Message Routing**: Unified messaging that routes to appropriate provider + +## Session Abstraction + +### Core Session Model + +```go +type Session struct { + ID string `json:"id"` + Space string `json:"space"` + AgentName string `json:"agent_name"` + Provider string `json:"provider"` + Status SessionStatus `json:"status"` + Config SessionConfig `json:"config"` + CreatedAt time.Time `json:"created_at"` + UpdatedAt time.Time `json:"updated_at"` + Metadata map[string]interface{} `json:"metadata,omitempty"` +} + +type SessionStatus string + +const ( + SessionStatusPending SessionStatus = "pending" // Created but not started + SessionStatusRunning SessionStatus = "running" // Active session + SessionStatusStopped SessionStatus = "stopped" // Stopped but not deleted + SessionStatusError SessionStatus = "error" // Failed to start/run + SessionStatusDeleted SessionStatus = "deleted" // Cleaned up +) + +type SessionConfig struct { + Image string `json:"image,omitempty"` // For container providers + Environment map[string]string `json:"environment,omitempty"` // Environment variables + WorkingDir string `json:"working_dir,omitempty"` // Working directory + Resources ResourceLimits `json:"resources,omitempty"` // Resource constraints + Network NetworkConfig `json:"network,omitempty"` // Network configuration +} + +type ResourceLimits struct { + CPUs float64 `json:"cpus,omitempty"` + Memory string `json:"memory,omitempty"` +} + +type NetworkConfig struct { + AllowedDomains []string `json:"allowed_domains,omitempty"` + Mode string `json:"mode,omitempty"` // "host", "bridge", "none" +} +``` + +### Provider Interface + +```go +type SessionProvider interface { + // Lifecycle operations + Create(ctx context.Context, session *Session) error + Start(ctx context.Context, sessionID string) error + Stop(ctx context.Context, sessionID string) error + Delete(ctx context.Context, sessionID string) error + + // Information operations + Get(ctx context.Context, sessionID string) (*Session, error) + List(ctx context.Context, space string) ([]*Session, error) + + // Communication operations + SendMessage(ctx context.Context, sessionID string, message string) error + + // Health and monitoring + IsHealthy(ctx context.Context, sessionID string) (bool, error) + GetLogs(ctx context.Context, sessionID string, lines int) ([]string, error) + + // Provider metadata + Name() string + SupportedFeatures() []string +} +``` + +## REST API Design + +### Session Management Endpoints + +``` +GET /api/v1/sessions # List all sessions across spaces +GET /api/v1/sessions?space={space} # List sessions in specific space +GET /api/v1/sessions?provider={provider} # List sessions by provider +POST /api/v1/sessions # Create new session +GET /api/v1/sessions/{id} # Get session details +PUT /api/v1/sessions/{id} # Update session configuration +DELETE /api/v1/sessions/{id} # Delete session + +POST /api/v1/sessions/{id}/start # Start session +POST /api/v1/sessions/{id}/stop # Stop session +POST /api/v1/sessions/{id}/restart # Restart session +POST /api/v1/sessions/{id}/message # Send message to session +GET /api/v1/sessions/{id}/logs # Get session logs +GET /api/v1/sessions/{id}/health # Check session health + +# Space-scoped session operations (backward compatibility) +GET /spaces/{space}/sessions # List sessions in space +POST /spaces/{space}/sessions # Create session in space +GET /spaces/{space}/sessions/{name} # Get session by agent name +DELETE /spaces/{space}/sessions/{name} # Delete session by agent name +POST /spaces/{space}/sessions/{name}/message # Send message to agent session +``` + +### Request/Response Examples + +#### Create Session + +```http +POST /api/v1/sessions +Content-Type: application/json + +{ + "space": "sdk-backend-replacement", + "agent_name": "API", + "provider": "paude", + "config": { + "image": "localhost/paude-claude:latest", + "environment": { + "AGENT_NAME": "API", + "WORKSPACE_NAME": "sdk-backend-replacement", + "BOSS_URL": "http://localhost:8899", + "CLAUDE_ALLOW_DANGEROUS_TOOLS": "1" + }, + "resources": { + "cpus": 1.0, + "memory": "2g" + }, + "network": { + "mode": "host", + "allowed_domains": ["*.googleapis.com", "localhost"] + } + } +} +``` + +#### Response + +```json +{ + "id": "sess_api_paude_1234567890", + "space": "sdk-backend-replacement", + "agent_name": "API", + "provider": "paude", + "status": "pending", + "config": { "..." }, + "created_at": "2026-03-06T10:30:00Z", + "updated_at": "2026-03-06T10:30:00Z", + "metadata": { + "container_id": "abc123def456", + "container_name": "agent-api" + } +} +``` + +#### Send Message + +```http +POST /api/v1/sessions/sess_api_paude_1234567890/message +Content-Type: application/json + +{ + "type": "broadcast", + "content": "All agents: please run tests and report status", + "priority": "normal" +} +``` + +## Provider Implementations + +### 1. Tmux Provider + +```yaml +# session-providers.yaml +providers: + tmux: + type: tmux + config: + session_prefix: "agent-boss" + shell: "/bin/bash" + working_dir: "/workspace" + environment: + TERM: "screen-256color" + features: + - "local_execution" + - "terminal_access" + - "session_persistence" +``` + +**Implementation Details:** +- Session ID: `tmux_{agent_name}_{timestamp}` +- Create: `tmux new-session -d -s {session_name}` +- Start: Session starts on creation (tmux model) +- Stop: `tmux send-keys -t {session_name} C-c` +- Delete: `tmux kill-session -t {session_name}` +- Message: `tmux send-keys -t {session_name} "{message}" Enter` +- Status: Parse `tmux list-sessions` output + +### 2. Paude Provider + +```yaml +providers: + paude: + type: paude + config: + base_image: "localhost/paude-claude:latest" + network_mode: "host" + privileged: true + volume_mounts: + - "~/projects:/workspace:Z" + features: + - "container_isolation" + - "network_filtering" + - "resource_limits" + - "auto_restart" +``` + +**Implementation Details:** +- Session ID: `paude_{agent_name}_{container_id}` +- Create: `podman run -d --name {container_name} {image}` +- Start: Container starts on creation (Paude model) +- Stop: `podman stop {container_name}` +- Delete: `podman rm {container_name}` +- Message: HTTP POST to coordination client inside container +- Status: Parse `podman inspect` output + +### 3. Ambient Provider + +```yaml +providers: + ambient: + type: ambient + config: + api_base_url: "https://api.ambient-code.com" + cluster_id: "prod-us-east-1" + project_id: "agent-boss-sessions" + features: + - "cloud_hosted" + - "scalable" + - "managed_infrastructure" + - "audit_logging" +``` + +**Implementation Details:** +- Session ID: UUID from Ambient API +- Create: `POST /v1/projects/{project}/sessions` +- Start: `POST /v1/sessions/{id}/start` +- Stop: `POST /v1/sessions/{id}/stop` +- Delete: `DELETE /v1/sessions/{id}` +- Message: `POST /v1/sessions/{id}/execute` +- Status: `GET /v1/sessions/{id}/status` + +## Integration with Existing Agent Boss + +### Backward Compatibility + +Current agent coordination continues unchanged: +- Agents still POST to `/spaces/{space}/agent/{name}` +- Broadcast still works via `/spaces/{space}/broadcast` +- Dashboard SSE continues to work +- Existing tmux monitoring preserved + +### Enhanced Capabilities + +New session API adds: +- **Provider choice**: Create sessions on tmux, Paude, or Ambient +- **Unified management**: Single API for all session types +- **Resource control**: Set CPU/memory limits across providers +- **Better monitoring**: Standardized health checks and logging +- **Configuration management**: Declarative session configuration + +### Migration Strategy + +```go +// Phase 1: Add session API alongside existing tmux code +// Phase 2: Migrate dashboard to use session API +// Phase 3: Deprecate direct tmux calls in favor of session API +// Phase 4: Add additional providers (Ambient, etc.) +``` + +## Configuration Model + +### Provider Configuration + +```yaml +# ~/.claude/agent-boss-config.yaml +session: + default_provider: "tmux" + providers: + tmux: + enabled: true + config: + shell: "/bin/bash" + session_prefix: "boss" + + paude: + enabled: true + config: + base_image: "localhost/paude-claude:latest" + network_mode: "host" + privileged: true + + ambient: + enabled: false + config: + api_key_file: "~/.ambient/api-key" + project_id: "my-project" + cluster: "us-east-1" + +# Space-specific overrides +spaces: + production: + default_provider: "ambient" + providers: + ambient: + config: + cluster: "prod-us-east-1" + + development: + default_provider: "tmux" +``` + +### Agent Session Templates + +```yaml +# Agent-specific session templates +agents: + API: + provider: "paude" + config: + resources: + cpus: 2.0 + memory: "4g" + environment: + CLAUDE_MODEL: "claude-sonnet-4" + + Frontend: + provider: "tmux" + config: + working_dir: "/workspace/frontend" + environment: + NODE_ENV: "development" +``` + +## Implementation Plan + +### Phase 1: Core Session Abstraction (Week 1) +- [ ] Define Session model and Provider interface +- [ ] Implement TmuxProvider with existing functionality +- [ ] Add basic session REST endpoints +- [ ] Maintain backward compatibility + +### Phase 2: Paude Provider Integration (Week 2) +- [ ] Implement PaudeProvider +- [ ] Migrate existing Paude scripts to use session API +- [ ] Add provider configuration system +- [ ] Update dashboard to show sessions from multiple providers + +### Phase 3: Enhanced Management (Week 3) +- [ ] Add session templates and configuration +- [ ] Implement session health monitoring +- [ ] Add session logs endpoint +- [ ] Provider-specific feature detection + +### Phase 4: Ambient Provider (Week 4) +- [ ] Implement AmbientProvider +- [ ] Add cloud session management +- [ ] Advanced resource management +- [ ] Production deployment capabilities + +## Benefits + +### For Users +- **Unified Interface**: Same API works across local, container, and cloud sessions +- **Provider Choice**: Pick the right session type for each use case +- **Better Monitoring**: Standardized health checks and logging +- **Easier Debugging**: Consistent session management regardless of provider + +### For Development +- **Cleaner Architecture**: Clear separation between session management and agent coordination +- **Easier Testing**: Mock providers for testing session logic +- **Provider Pluggability**: Easy to add new session providers +- **Configuration Management**: Declarative session configuration + +### For Operations +- **Scalability**: Cloud providers for production workloads +- **Resource Control**: Consistent resource limits across providers +- **Monitoring**: Unified session status and health checks +- **Deployment Flexibility**: Same coordination logic, different execution environments + +## Security Considerations + +### Provider Isolation +- Each provider handles its own security model +- Paude: Network filtering + container isolation +- Tmux: Host-level permissions +- Ambient: Cloud IAM + audit logging + +### API Security +- Session API requires same authentication as existing Agent Boss endpoints +- Session operations logged for audit +- Provider-specific credentials managed separately + +### Message Security +- Messages routed through provider-specific channels +- No sensitive data in session metadata +- Provider handles message encryption/security + +## Future Extensions + +### Additional Providers +- **Kubernetes**: Sessions as pods in K8s cluster +- **Lambda**: Serverless session execution +- **Docker**: Direct Docker container sessions +- **SSH**: Remote session execution + +### Advanced Features +- **Session Templates**: Predefined session configurations for different agent roles +- **Auto-scaling**: Automatic session scaling based on load +- **Session Migration**: Move sessions between providers +- **Session Recording**: Record and replay session interactions + +This design provides a clean abstraction over session management while maintaining the existing agent coordination capabilities that make Agent Boss effective. The provider model allows for flexibility and future expansion while keeping the core coordination logic unified. \ No newline at end of file