Skip to content

Latest commit

 

History

History
625 lines (507 loc) · 23.2 KB

File metadata and controls

625 lines (507 loc) · 23.2 KB

Superseded by GENOME-ARCHITECTURE.md — kept as reference

Continuous Learning Runtime Architecture

Status: Design Draft Builds On: TRAINING-SYSTEM-ARCHITECTURE.md, LORA-TRAINING-STRATEGY.md Focus: Runtime loop bridging Python training → Rust inference


The Gap This Fills

Existing docs cover:

  • ✅ Dataset generation and management
  • ✅ Training orchestration (Python MLX/PEFT)
  • ✅ PersonaGenome entity model

Missing:

  • ❌ How trained adapters get into Rust/Candle inference
  • ❌ Runtime continuous learning loop
  • ❌ Hot-swap mechanism for live personas

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                        CONTINUOUS LEARNING RUNTIME                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────────────────────┐│
│  │ Data Sources │     │   Training   │     │     Rust Inference Server    ││
│  │              │     │   Pipeline   │     │        (Candle gRPC)         ││
│  │ • Mistakes   │     │              │     │                              ││
│  │ • Corrections│────▶│ • MLX (Mac)  │────▶│ • LoadAdapter RPC            ││
│  │ • Feedback   │     │ • Unsloth    │     │ • Hot-swap adapters          ││
│  │ • Imports    │     │ • Cloud APIs │     │ • Multi-adapter composition  ││
│  └──────────────┘     └──────────────┘     └──────────────────────────────┘│
│         │                    │                         │                    │
│         │                    ▼                         │                    │
│         │           .safetensors adapter               │                    │
│         │                    │                         │                    │
│         │                    ▼                         ▼                    │
│         │  ┌─────────────────────────────────────────────────────────────┐ │
│         │  │                    Adapter Registry                         │ │
│         │  │  .continuum/genome/adapters/                                │ │
│         │  │  ├── helper-ai/                                             │ │
│         │  │  │   ├── typescript-v1.2.safetensors                        │ │
│         │  │  │   ├── chat-style-v3.1.safetensors                        │ │
│         │  │  │   └── manifest.json                                      │ │
│         │  │  └── shared/                                                │ │
│         │  │      └── coding-standards-v1.0.safetensors                  │ │
│         │  └─────────────────────────────────────────────────────────────┘ │
│         │                                                                   │
│         └──────────────────── Feedback Loop ────────────────────────────────┘
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Component Details

1. Data Accumulator

Collects training signal from live interactions:

interface TrainingSignal {
  type: 'mistake' | 'correction' | 'positive' | 'import';
  personaId: string;
  domain: string;           // 'typescript', 'chat', 'wine', etc.
  input: string;            // What the AI saw
  wrongOutput?: string;     // What it said (if mistake)
  correctOutput: string;    // What it should have said
  context?: string;         // Surrounding conversation
  source: 'human' | 'ai' | 'automated';
  timestamp: number;
  weight: number;           // 1.0 = gold (human), 0.5 = silver (ai peer)
}

Collection points:

  • Human corrections in chat ("Actually, you should...")
  • Thumbs down reactions
  • Failed tool calls
  • Peer AI corrections
  • Imported datasets

Storage:

.continuum/genome/training-buffer/
├── helper-ai/
│   ├── pending/
│   │   ├── 2025-01-06-001.jsonl  # Today's signals
│   │   └── 2025-01-06-002.jsonl
│   └── queued/
│       └── batch-2025-01-06.jsonl  # Ready for training
└── manifest.json

2. Training Trigger

Decides when to train:

interface TrainingTrigger {
  type: 'threshold' | 'scheduled' | 'manual' | 'quality-drop';

  // Threshold: train after N examples
  thresholdConfig?: {
    minExamples: number;      // e.g., 500
    maxWaitHours: number;     // e.g., 24 (train anyway if waiting too long)
  };

  // Scheduled: train at specific times
  scheduleConfig?: {
    cron: string;             // e.g., "0 3 * * *" (3am daily)
    onlyIfData: boolean;      // Skip if no new data
  };

  // Quality drop: train when performance degrades
  qualityConfig?: {
    metricName: string;       // e.g., 'user_satisfaction'
    threshold: number;        // e.g., 0.8
    windowHours: number;      // e.g., 24
  };
}

Default strategy:

const defaultTrigger: TrainingTrigger = {
  type: 'threshold',
  thresholdConfig: {
    minExamples: 500,        // Train after 500 corrections
    maxWaitHours: 168,       // Or weekly, whichever comes first
  }
};

3. Training Executor

Runs the actual training job:

interface TrainingJob {
  jobId: string;
  personaId: string;
  domain: string;

  // Input
  datasetPath: string;       // .jsonl file
  baseAdapter?: string;      // Previous version to continue from

  // Config
  provider: 'mlx' | 'unsloth' | 'fireworks';
  hyperparameters: {
    rank: number;            // LoRA rank (8 for style, 32 for knowledge)
    alpha: number;           // Typically 2x rank
    epochs: number;          // 1-3 for incremental
    learningRate: number;    // 2e-4 typical
  };

  // Output
  outputPath: string;        // Where to save .safetensors

  // Status
  status: 'pending' | 'running' | 'completed' | 'failed';
  progress?: number;         // 0-100
  metrics?: TrainingMetrics;
}

Execution flow:

# 1. Prepare dataset
./jtag training/prepare \
  --input .continuum/genome/training-buffer/helper-ai/queued/*.jsonl \
  --output /tmp/training-batch.jsonl

# 2. Run training (MLX on Mac)
python scripts/train-lora-mlx.py \
  --base-model "mlx-community/Llama-3.2-3B-Instruct-4bit" \
  --dataset /tmp/training-batch.jsonl \
  --output .continuum/genome/adapters/helper-ai/typescript-v1.3.safetensors \
  --continue-from .continuum/genome/adapters/helper-ai/typescript-v1.2.safetensors \
  --rank 32 --alpha 64 --epochs 1

# 3. Validate (quick quality check)
./jtag training/validate \
  --adapter .continuum/genome/adapters/helper-ai/typescript-v1.3.safetensors \
  --test-set /datasets/typescript-test.jsonl \
  --min-quality 0.8

# 4. Deploy to inference server (HOT SWAP)
./jtag inference/load-adapter \
  --persona helper-ai \
  --adapter typescript-v1.3 \
  --replace typescript-v1.2

4. Rust Inference Integration (The Bridge)

Current state: gRPC server has LoadAdapter RPC but only tracks metadata.

Required implementation:

// workers/inference-grpc/src/lora.rs

use candle_core::{Tensor, DType, Device};
use candle_nn::VarBuilder;

/// Load LoRA adapter weights from safetensors file
pub fn load_lora_weights(
    adapter_path: &str,
    device: &Device,
    dtype: DType,
) -> Result<LoRAWeights, Box<dyn std::error::Error>> {
    let weights = candle_core::safetensors::load(adapter_path, device)?;

    // LoRA weights are typically named:
    // - lora_A.{layer}.weight
    // - lora_B.{layer}.weight

    let mut lora_weights = LoRAWeights::new();

    for (name, tensor) in weights {
        if name.contains("lora_A") {
            let layer = extract_layer_name(&name);
            lora_weights.add_a(layer, tensor.to_dtype(dtype)?);
        } else if name.contains("lora_B") {
            let layer = extract_layer_name(&name);
            lora_weights.add_b(layer, tensor.to_dtype(dtype)?);
        }
    }

    Ok(lora_weights)
}

/// Apply LoRA to base model weights
/// Formula: W' = W + scale * (B @ A)
pub fn apply_lora(
    base_weight: &Tensor,
    lora_a: &Tensor,      // [rank, in_features]
    lora_b: &Tensor,      // [out_features, rank]
    scale: f64,
) -> Result<Tensor, candle_core::Error> {
    let delta = lora_b.matmul(lora_a)?;
    let scaled = (delta * scale)?;
    base_weight.add(&scaled)
}

Integration with existing Llama model:

// Modified forward pass with LoRA
impl ModelState {
    fn forward_with_lora(
        &self,
        input: &Tensor,
        pos: usize,
        adapters: &[LoadedLoRA],
    ) -> Result<Tensor, String> {
        // For each layer that has LoRA adapters:
        // 1. Get base weight
        // 2. Apply each adapter: W' = W + sum(scale_i * B_i @ A_i)
        // 3. Use modified weight for forward pass

        // Note: Can pre-merge adapters at load time for efficiency
        // Only re-merge when adapter set changes
    }
}

5. Hot-Swap Protocol

Goal: Zero-downtime adapter updates while persona is running.

Timeline:
─────────────────────────────────────────────────────────────────────
t0: Persona running with typescript-v1.2
t1: Training completes → typescript-v1.3 ready
t2: LoadAdapter RPC called with typescript-v1.3
t3: Server loads new weights (1-2 seconds)
t4: Server atomically swaps active adapter
t5: Next inference uses v1.3
t6: Old v1.2 marked for unload (after grace period)
─────────────────────────────────────────────────────────────────────

gRPC protocol:

// Already defined in inference.proto
rpc LoadAdapter(LoadAdapterRequest) returns (LoadAdapterResponse);

message LoadAdapterRequest {
  string adapter_path = 1;   // Full path to .safetensors
  string adapter_id = 2;     // Unique ID (e.g., "typescript-v1.3")
  double scale = 3;          // LoRA scale factor
  string replace_id = 4;     // Optional: adapter to replace atomically
}

Continuous Learning Daemon

TrainingDaemon orchestrates the full loop:

// daemons/training-daemon/server/TrainingDaemonServer.ts

export class TrainingDaemonServer extends BaseDaemonServer {
  private accumulators: Map<string, DataAccumulator> = new Map();
  private triggers: Map<string, TrainingTrigger> = new Map();

  async onStart(): Promise<void> {
    // Subscribe to training signals
    Events.subscribe('persona:mistake', this.onMistake.bind(this));
    Events.subscribe('persona:correction', this.onCorrection.bind(this));
    Events.subscribe('persona:feedback', this.onFeedback.bind(this));

    // Start trigger check loop
    this.startTriggerLoop();
  }

  private async onCorrection(signal: TrainingSignal): Promise<void> {
    const accumulator = this.getAccumulator(signal.personaId, signal.domain);
    await accumulator.add(signal);

    // Check if threshold reached
    if (await this.shouldTrain(signal.personaId, signal.domain)) {
      await this.queueTrainingJob(signal.personaId, signal.domain);
    }
  }

  private async executeTrainingJob(job: TrainingJob): Promise<void> {
    // 1. Prepare dataset
    const datasetPath = await this.prepareDataset(job);

    // 2. Run training (spawn Python process)
    const result = await this.runTraining(job, datasetPath);

    // 3. Validate quality
    const quality = await this.validateAdapter(result.adapterPath);

    if (quality >= job.minQuality) {
      // 4. Deploy to inference server
      await this.deployAdapter(job.personaId, result.adapterPath);

      // 5. Archive old version
      await this.archiveOldAdapter(job.personaId, job.domain);
    } else {
      // Quality regression - don't deploy
      console.warn(`Training produced lower quality (${quality}), keeping old adapter`);
    }
  }
}

File System Layout

.continuum/
├── genome/
│   ├── adapters/                    # Trained LoRA adapters
│   │   ├── helper-ai/
│   │   │   ├── typescript-v1.2.safetensors
│   │   │   ├── typescript-v1.3.safetensors  # Latest
│   │   │   ├── chat-style-v3.1.safetensors
│   │   │   └── manifest.json        # Active versions
│   │   └── shared/                  # Cross-persona adapters
│   │       └── coding-standards-v1.0.safetensors
│   │
│   ├── training-buffer/             # Pending training data
│   │   ├── helper-ai/
│   │   │   ├── pending/             # Accumulating
│   │   │   └── queued/              # Ready for training
│   │   └── manifest.json
│   │
│   ├── training-jobs/               # Job state
│   │   ├── active/
│   │   │   └── job-2025-01-06-001.json
│   │   └── completed/
│   │       └── job-2025-01-05-001.json
│   │
│   └── checkpoints/                 # Training checkpoints
│       └── helper-ai-typescript-v1.3/
│           ├── checkpoint-500.safetensors
│           └── checkpoint-1000.safetensors
│
└── personas/
    └── helper/
        └── genome.json              # Points to active adapters

Vision Model Support

In addition to text-only LLMs, the system supports Vision Language Models (VLMs) for UI understanding, screenshot analysis, and design critique.

Supported Vision Models

Model Params Training Inference Best For
Qwen2.5-VL 3B 3B MLX-VLM (Mac) Candle (Rust) UI design, screenshots
Moondream 2 1.6B MLX-VLM Candle (Rust) Fast image Q&A
LLaVA 1.5/1.6 7-13B Unsloth (CUDA) Candle (Rust) Detailed analysis
SmolVLM 2B HuggingFace Transformers Memory-efficient

Training on Apple Silicon (MLX-VLM)

# Install MLX-VLM
pip install mlx-vlm

# Fine-tune Qwen2.5-VL for UI design critique
python -m mlx_vlm.lora \
  --model Qwen/Qwen2.5-VL-3B-Instruct \
  --data ./datasets/ui-critique.jsonl \
  --output .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
  --lora-rank 32 \
  --epochs 3

# Convert for Candle inference
python -m mlx_vlm.convert \
  --adapter .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
  --output ./ui-expert.safetensors

Training on NVIDIA GPU (RTX 5090)

# Unsloth for maximum speed
pip install unsloth

# Fine-tune with QLoRA (fits in 32GB VRAM)
python scripts/train-vlm-unsloth.py \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --dataset ./datasets/ui-critique.jsonl \
  --output .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
  --lora-rank 64 \
  --epochs 3 \
  --vision-lora true  # Include vision encoder in LoRA

Vision Training Datasets

  1. UICrit - 983 UIs with expert design critiques, bounding boxes, quality ratings
  2. GUICourse - 70k instruction-action pairs from 13k website screenshots
  3. Custom Continuum - Screenshots of your own UI with corrections

Vision Adapter Storage

.continuum/genome/adapters/
├── helper-ai/
│   ├── typescript-v1.2.safetensors    # Text adapter
│   └── manifest.json
├── vision/                             # Vision adapters (shared)
│   ├── ui-expert-v1.safetensors       # UI design critique
│   ├── screenshot-reader-v1.safetensors
│   └── manifest.json
└── shared/
    └── coding-standards-v1.0.safetensors

Inference Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    VISION INFERENCE FLOW                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Screenshot ─────┐                                               │
│                  │                                               │
│                  ▼                                               │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │           Rust Inference Worker (Candle)                    ││
│  │                                                             ││
│  │  ┌─────────────┐    ┌─────────────┐    ┌───────────────┐   ││
│  │  │ ViT Encoder │───▶│  Projector  │───▶│ Text Decoder  │   ││
│  │  │ (Moondream) │    │             │    │ + LoRA Adapter│   ││
│  │  └─────────────┘    └─────────────┘    └───────────────┘   ││
│  │                                                             ││
│  └─────────────────────────────────────────────────────────────┘│
│                  │                                               │
│                  ▼                                               │
│  "This UI has poor visual hierarchy. The CTA button..."         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Vision-Specific Commands

# Analyze UI screenshot
./jtag ai/vision/analyze \
  --image ./screenshot.png \
  --prompt "What usability issues do you see?"

# Generate HTML from mockup
./jtag ai/vision/generate-code \
  --image ./mockup.png \
  --format html

# UI critique with trained adapter
./jtag ai/vision/critique \
  --image ./ui-screenshot.png \
  --adapter ui-expert-v1

Phase 6: Vision Model Integration

  • Add Moondream to Candle inference worker
  • Implement image encoding in gRPC proto
  • Create ai/vision/* commands
  • Integrate MLX-VLM training scripts
  • Add vision adapter hot-swap support
  • Train UI critique adapter on UICrit dataset

Implementation Phases

Phase 1: LoRA Loading in Rust (Current Priority)

  • Implement load_lora_weights() in Candle
  • Add apply_lora() for weight merging
  • Wire up LoadAdapter RPC to actually load weights
  • Test with a pre-trained adapter

Phase 2: Data Accumulation

  • Create TrainingSignal entity
  • Implement DataAccumulator
  • Add correction detection in chat
  • Store signals in training-buffer

Phase 3: Training Automation

  • Create TrainingDaemon
  • Implement threshold-based triggers
  • Wrap MLX training script
  • Add quality validation

Phase 4: Hot-Swap Deployment

  • Implement atomic adapter swap
  • Add version management
  • Implement rollback mechanism
  • Add monitoring/alerting

Phase 5: Feedback Loop

  • Track adapter performance over time
  • Detect quality regressions
  • Auto-rollback on regression
  • Dashboard for training status

Commands

# Data management
./jtag training/buffer/status           # Show pending data counts
./jtag training/buffer/flush --persona=helper-ai  # Force queue data

# Training
./jtag training/start --persona=helper-ai --domain=typescript
./jtag training/status                  # Show active jobs
./jtag training/stop --job=job-001

# Adapters
./jtag adapter/list --persona=helper-ai
./jtag adapter/deploy --path=./adapter.safetensors --persona=helper-ai
./jtag adapter/rollback --persona=helper-ai --domain=typescript

# Continuous learning
./jtag training/continuous/start --persona=helper-ai
./jtag training/continuous/stop --persona=helper-ai
./jtag training/continuous/status

Success Metrics

  1. Training latency: < 10 minutes for incremental update (500 examples)
  2. Hot-swap latency: < 2 seconds for adapter swap
  3. Quality retention: New adapter >= 95% of old adapter quality
  4. Data efficiency: Measurable improvement from 500 corrections
  5. Uptime: Zero inference downtime during adapter updates

References

Internal Docs

External Resources

Training Frameworks:

Datasets:

  • UICrit - 983 UIs with expert design critiques
  • GUICourse - 70k GUI instruction-action pairs

Models: