Superseded by GENOME-ARCHITECTURE.md — kept as reference
Status: Design Draft Builds On: TRAINING-SYSTEM-ARCHITECTURE.md, LORA-TRAINING-STRATEGY.md Focus: Runtime loop bridging Python training → Rust inference
Existing docs cover:
- ✅ Dataset generation and management
- ✅ Training orchestration (Python MLX/PEFT)
- ✅ PersonaGenome entity model
Missing:
- ❌ How trained adapters get into Rust/Candle inference
- ❌ Runtime continuous learning loop
- ❌ Hot-swap mechanism for live personas
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONTINUOUS LEARNING RUNTIME │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────┐│
│ │ Data Sources │ │ Training │ │ Rust Inference Server ││
│ │ │ │ Pipeline │ │ (Candle gRPC) ││
│ │ • Mistakes │ │ │ │ ││
│ │ • Corrections│────▶│ • MLX (Mac) │────▶│ • LoadAdapter RPC ││
│ │ • Feedback │ │ • Unsloth │ │ • Hot-swap adapters ││
│ │ • Imports │ │ • Cloud APIs │ │ • Multi-adapter composition ││
│ └──────────────┘ └──────────────┘ └──────────────────────────────┘│
│ │ │ │ │
│ │ ▼ │ │
│ │ .safetensors adapter │ │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │
│ │ │ Adapter Registry │ │
│ │ │ .continuum/genome/adapters/ │ │
│ │ │ ├── helper-ai/ │ │
│ │ │ │ ├── typescript-v1.2.safetensors │ │
│ │ │ │ ├── chat-style-v3.1.safetensors │ │
│ │ │ │ └── manifest.json │ │
│ │ │ └── shared/ │ │
│ │ │ └── coding-standards-v1.0.safetensors │ │
│ │ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ └──────────────────── Feedback Loop ────────────────────────────────┘
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Collects training signal from live interactions:
interface TrainingSignal {
type: 'mistake' | 'correction' | 'positive' | 'import';
personaId: string;
domain: string; // 'typescript', 'chat', 'wine', etc.
input: string; // What the AI saw
wrongOutput?: string; // What it said (if mistake)
correctOutput: string; // What it should have said
context?: string; // Surrounding conversation
source: 'human' | 'ai' | 'automated';
timestamp: number;
weight: number; // 1.0 = gold (human), 0.5 = silver (ai peer)
}Collection points:
- Human corrections in chat ("Actually, you should...")
- Thumbs down reactions
- Failed tool calls
- Peer AI corrections
- Imported datasets
Storage:
.continuum/genome/training-buffer/
├── helper-ai/
│ ├── pending/
│ │ ├── 2025-01-06-001.jsonl # Today's signals
│ │ └── 2025-01-06-002.jsonl
│ └── queued/
│ └── batch-2025-01-06.jsonl # Ready for training
└── manifest.json
Decides when to train:
interface TrainingTrigger {
type: 'threshold' | 'scheduled' | 'manual' | 'quality-drop';
// Threshold: train after N examples
thresholdConfig?: {
minExamples: number; // e.g., 500
maxWaitHours: number; // e.g., 24 (train anyway if waiting too long)
};
// Scheduled: train at specific times
scheduleConfig?: {
cron: string; // e.g., "0 3 * * *" (3am daily)
onlyIfData: boolean; // Skip if no new data
};
// Quality drop: train when performance degrades
qualityConfig?: {
metricName: string; // e.g., 'user_satisfaction'
threshold: number; // e.g., 0.8
windowHours: number; // e.g., 24
};
}Default strategy:
const defaultTrigger: TrainingTrigger = {
type: 'threshold',
thresholdConfig: {
minExamples: 500, // Train after 500 corrections
maxWaitHours: 168, // Or weekly, whichever comes first
}
};Runs the actual training job:
interface TrainingJob {
jobId: string;
personaId: string;
domain: string;
// Input
datasetPath: string; // .jsonl file
baseAdapter?: string; // Previous version to continue from
// Config
provider: 'mlx' | 'unsloth' | 'fireworks';
hyperparameters: {
rank: number; // LoRA rank (8 for style, 32 for knowledge)
alpha: number; // Typically 2x rank
epochs: number; // 1-3 for incremental
learningRate: number; // 2e-4 typical
};
// Output
outputPath: string; // Where to save .safetensors
// Status
status: 'pending' | 'running' | 'completed' | 'failed';
progress?: number; // 0-100
metrics?: TrainingMetrics;
}Execution flow:
# 1. Prepare dataset
./jtag training/prepare \
--input .continuum/genome/training-buffer/helper-ai/queued/*.jsonl \
--output /tmp/training-batch.jsonl
# 2. Run training (MLX on Mac)
python scripts/train-lora-mlx.py \
--base-model "mlx-community/Llama-3.2-3B-Instruct-4bit" \
--dataset /tmp/training-batch.jsonl \
--output .continuum/genome/adapters/helper-ai/typescript-v1.3.safetensors \
--continue-from .continuum/genome/adapters/helper-ai/typescript-v1.2.safetensors \
--rank 32 --alpha 64 --epochs 1
# 3. Validate (quick quality check)
./jtag training/validate \
--adapter .continuum/genome/adapters/helper-ai/typescript-v1.3.safetensors \
--test-set /datasets/typescript-test.jsonl \
--min-quality 0.8
# 4. Deploy to inference server (HOT SWAP)
./jtag inference/load-adapter \
--persona helper-ai \
--adapter typescript-v1.3 \
--replace typescript-v1.2Current state: gRPC server has LoadAdapter RPC but only tracks metadata.
Required implementation:
// workers/inference-grpc/src/lora.rs
use candle_core::{Tensor, DType, Device};
use candle_nn::VarBuilder;
/// Load LoRA adapter weights from safetensors file
pub fn load_lora_weights(
adapter_path: &str,
device: &Device,
dtype: DType,
) -> Result<LoRAWeights, Box<dyn std::error::Error>> {
let weights = candle_core::safetensors::load(adapter_path, device)?;
// LoRA weights are typically named:
// - lora_A.{layer}.weight
// - lora_B.{layer}.weight
let mut lora_weights = LoRAWeights::new();
for (name, tensor) in weights {
if name.contains("lora_A") {
let layer = extract_layer_name(&name);
lora_weights.add_a(layer, tensor.to_dtype(dtype)?);
} else if name.contains("lora_B") {
let layer = extract_layer_name(&name);
lora_weights.add_b(layer, tensor.to_dtype(dtype)?);
}
}
Ok(lora_weights)
}
/// Apply LoRA to base model weights
/// Formula: W' = W + scale * (B @ A)
pub fn apply_lora(
base_weight: &Tensor,
lora_a: &Tensor, // [rank, in_features]
lora_b: &Tensor, // [out_features, rank]
scale: f64,
) -> Result<Tensor, candle_core::Error> {
let delta = lora_b.matmul(lora_a)?;
let scaled = (delta * scale)?;
base_weight.add(&scaled)
}Integration with existing Llama model:
// Modified forward pass with LoRA
impl ModelState {
fn forward_with_lora(
&self,
input: &Tensor,
pos: usize,
adapters: &[LoadedLoRA],
) -> Result<Tensor, String> {
// For each layer that has LoRA adapters:
// 1. Get base weight
// 2. Apply each adapter: W' = W + sum(scale_i * B_i @ A_i)
// 3. Use modified weight for forward pass
// Note: Can pre-merge adapters at load time for efficiency
// Only re-merge when adapter set changes
}
}Goal: Zero-downtime adapter updates while persona is running.
Timeline:
─────────────────────────────────────────────────────────────────────
t0: Persona running with typescript-v1.2
t1: Training completes → typescript-v1.3 ready
t2: LoadAdapter RPC called with typescript-v1.3
t3: Server loads new weights (1-2 seconds)
t4: Server atomically swaps active adapter
t5: Next inference uses v1.3
t6: Old v1.2 marked for unload (after grace period)
─────────────────────────────────────────────────────────────────────
gRPC protocol:
// Already defined in inference.proto
rpc LoadAdapter(LoadAdapterRequest) returns (LoadAdapterResponse);
message LoadAdapterRequest {
string adapter_path = 1; // Full path to .safetensors
string adapter_id = 2; // Unique ID (e.g., "typescript-v1.3")
double scale = 3; // LoRA scale factor
string replace_id = 4; // Optional: adapter to replace atomically
}TrainingDaemon orchestrates the full loop:
// daemons/training-daemon/server/TrainingDaemonServer.ts
export class TrainingDaemonServer extends BaseDaemonServer {
private accumulators: Map<string, DataAccumulator> = new Map();
private triggers: Map<string, TrainingTrigger> = new Map();
async onStart(): Promise<void> {
// Subscribe to training signals
Events.subscribe('persona:mistake', this.onMistake.bind(this));
Events.subscribe('persona:correction', this.onCorrection.bind(this));
Events.subscribe('persona:feedback', this.onFeedback.bind(this));
// Start trigger check loop
this.startTriggerLoop();
}
private async onCorrection(signal: TrainingSignal): Promise<void> {
const accumulator = this.getAccumulator(signal.personaId, signal.domain);
await accumulator.add(signal);
// Check if threshold reached
if (await this.shouldTrain(signal.personaId, signal.domain)) {
await this.queueTrainingJob(signal.personaId, signal.domain);
}
}
private async executeTrainingJob(job: TrainingJob): Promise<void> {
// 1. Prepare dataset
const datasetPath = await this.prepareDataset(job);
// 2. Run training (spawn Python process)
const result = await this.runTraining(job, datasetPath);
// 3. Validate quality
const quality = await this.validateAdapter(result.adapterPath);
if (quality >= job.minQuality) {
// 4. Deploy to inference server
await this.deployAdapter(job.personaId, result.adapterPath);
// 5. Archive old version
await this.archiveOldAdapter(job.personaId, job.domain);
} else {
// Quality regression - don't deploy
console.warn(`Training produced lower quality (${quality}), keeping old adapter`);
}
}
}.continuum/
├── genome/
│ ├── adapters/ # Trained LoRA adapters
│ │ ├── helper-ai/
│ │ │ ├── typescript-v1.2.safetensors
│ │ │ ├── typescript-v1.3.safetensors # Latest
│ │ │ ├── chat-style-v3.1.safetensors
│ │ │ └── manifest.json # Active versions
│ │ └── shared/ # Cross-persona adapters
│ │ └── coding-standards-v1.0.safetensors
│ │
│ ├── training-buffer/ # Pending training data
│ │ ├── helper-ai/
│ │ │ ├── pending/ # Accumulating
│ │ │ └── queued/ # Ready for training
│ │ └── manifest.json
│ │
│ ├── training-jobs/ # Job state
│ │ ├── active/
│ │ │ └── job-2025-01-06-001.json
│ │ └── completed/
│ │ └── job-2025-01-05-001.json
│ │
│ └── checkpoints/ # Training checkpoints
│ └── helper-ai-typescript-v1.3/
│ ├── checkpoint-500.safetensors
│ └── checkpoint-1000.safetensors
│
└── personas/
└── helper/
└── genome.json # Points to active adapters
In addition to text-only LLMs, the system supports Vision Language Models (VLMs) for UI understanding, screenshot analysis, and design critique.
| Model | Params | Training | Inference | Best For |
|---|---|---|---|---|
| Qwen2.5-VL 3B | 3B | MLX-VLM (Mac) | Candle (Rust) | UI design, screenshots |
| Moondream 2 | 1.6B | MLX-VLM | Candle (Rust) | Fast image Q&A |
| LLaVA 1.5/1.6 | 7-13B | Unsloth (CUDA) | Candle (Rust) | Detailed analysis |
| SmolVLM | 2B | HuggingFace | Transformers | Memory-efficient |
# Install MLX-VLM
pip install mlx-vlm
# Fine-tune Qwen2.5-VL for UI design critique
python -m mlx_vlm.lora \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--data ./datasets/ui-critique.jsonl \
--output .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
--lora-rank 32 \
--epochs 3
# Convert for Candle inference
python -m mlx_vlm.convert \
--adapter .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
--output ./ui-expert.safetensors# Unsloth for maximum speed
pip install unsloth
# Fine-tune with QLoRA (fits in 32GB VRAM)
python scripts/train-vlm-unsloth.py \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--dataset ./datasets/ui-critique.jsonl \
--output .continuum/genome/adapters/vision/ui-expert-v1.safetensors \
--lora-rank 64 \
--epochs 3 \
--vision-lora true # Include vision encoder in LoRA- UICrit - 983 UIs with expert design critiques, bounding boxes, quality ratings
- GUICourse - 70k instruction-action pairs from 13k website screenshots
- Custom Continuum - Screenshots of your own UI with corrections
.continuum/genome/adapters/
├── helper-ai/
│ ├── typescript-v1.2.safetensors # Text adapter
│ └── manifest.json
├── vision/ # Vision adapters (shared)
│ ├── ui-expert-v1.safetensors # UI design critique
│ ├── screenshot-reader-v1.safetensors
│ └── manifest.json
└── shared/
└── coding-standards-v1.0.safetensors
┌─────────────────────────────────────────────────────────────────┐
│ VISION INFERENCE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Screenshot ─────┐ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Rust Inference Worker (Candle) ││
│ │ ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ ││
│ │ │ ViT Encoder │───▶│ Projector │───▶│ Text Decoder │ ││
│ │ │ (Moondream) │ │ │ │ + LoRA Adapter│ ││
│ │ └─────────────┘ └─────────────┘ └───────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ "This UI has poor visual hierarchy. The CTA button..." │
│ │
└─────────────────────────────────────────────────────────────────┘
# Analyze UI screenshot
./jtag ai/vision/analyze \
--image ./screenshot.png \
--prompt "What usability issues do you see?"
# Generate HTML from mockup
./jtag ai/vision/generate-code \
--image ./mockup.png \
--format html
# UI critique with trained adapter
./jtag ai/vision/critique \
--image ./ui-screenshot.png \
--adapter ui-expert-v1- Add Moondream to Candle inference worker
- Implement image encoding in gRPC proto
- Create
ai/vision/*commands - Integrate MLX-VLM training scripts
- Add vision adapter hot-swap support
- Train UI critique adapter on UICrit dataset
- Implement
load_lora_weights()in Candle - Add
apply_lora()for weight merging - Wire up
LoadAdapterRPC to actually load weights - Test with a pre-trained adapter
- Create TrainingSignal entity
- Implement DataAccumulator
- Add correction detection in chat
- Store signals in training-buffer
- Create TrainingDaemon
- Implement threshold-based triggers
- Wrap MLX training script
- Add quality validation
- Implement atomic adapter swap
- Add version management
- Implement rollback mechanism
- Add monitoring/alerting
- Track adapter performance over time
- Detect quality regressions
- Auto-rollback on regression
- Dashboard for training status
# Data management
./jtag training/buffer/status # Show pending data counts
./jtag training/buffer/flush --persona=helper-ai # Force queue data
# Training
./jtag training/start --persona=helper-ai --domain=typescript
./jtag training/status # Show active jobs
./jtag training/stop --job=job-001
# Adapters
./jtag adapter/list --persona=helper-ai
./jtag adapter/deploy --path=./adapter.safetensors --persona=helper-ai
./jtag adapter/rollback --persona=helper-ai --domain=typescript
# Continuous learning
./jtag training/continuous/start --persona=helper-ai
./jtag training/continuous/stop --persona=helper-ai
./jtag training/continuous/status- Training latency: < 10 minutes for incremental update (500 examples)
- Hot-swap latency: < 2 seconds for adapter swap
- Quality retention: New adapter >= 95% of old adapter quality
- Data efficiency: Measurable improvement from 500 corrections
- Uptime: Zero inference downtime during adapter updates
- TRAINING-SYSTEM-ARCHITECTURE.md - Full training system design
- LORA-TRAINING-STRATEGY.md - Training approaches and costs
- COLLABORATIVE-LEARNING-VISION.md - Multi-layer learning loop
- docs/genome/DYNAMIC-GENOME-ARCHITECTURE.md - PersonaGenome design
Training Frameworks:
- MLX-VLM - VLM inference and fine-tuning on Apple Silicon
- mlx-image - Vision transformer training on Mac
- Unsloth - Fast LoRA training on NVIDIA GPUs
- Qwen-VL Fine-Tuning - Complete Qwen-VL training repo
Datasets:
Models:
- Qwen2.5-VL-3B - Best small VLM for UI
- Moondream 2 - Tiny VLM (1.6B) for edge
- SmolVLM - Memory-efficient VLM with LoRA adapters