Skip to content

frankbria/iris

Repository files navigation

IRIS - Interface Recognition & Interaction Suite

👁️ AI-powered UI understanding and testing toolkit

Phase 1: ✅ Complete | Phase 2: 🟡 75% Complete (CLI, Accessibility, AI Vision Foundation)

IRIS gives AI coding assistants "eyes and hands" to see and interact with user interfaces through natural language commands, visual regression testing, and accessibility validation.


Current Status

✅ Phase 1 - Complete (Production-Ready)

Core Features Available:

  • ✅ Natural language UI commands with AI translation
  • ✅ Browser automation via Playwright
  • ✅ File watching with automatic re-execution
  • ✅ JSON-RPC protocol for AI coding assistant integration
  • ✅ SQLite persistence for test runs and results
  • ✅ Multi-provider AI support (OpenAI/Anthropic/Ollama)

✅ Phase 2 - Visual Regression & Accessibility (COMPLETE)

Status: Production-ready with 95.9% test pass rate and comprehensive feature coverage

Visual Testing Core:

  • ✅ Visual capture engine with page stabilization and masking
  • ✅ SSIM and pixel-based diff engine with region analysis
  • ✅ Git-integrated baseline management (branch/commit/timestamp strategies)
  • ✅ Multi-device testing (desktop, tablet, mobile)
  • ✅ Complete TypeScript/Zod type system

AI Vision Integration:

  • ✅ AI-powered semantic analysis (OpenAI GPT-4o, Claude 3.5 Sonnet, Ollama)
  • ✅ Multimodal AI client architecture (src/ai-client/ - reusable for future AI vision tasks)
  • ✅ Image preprocessing pipeline (resize, optimize, base64 encoding)
  • ✅ AI vision result caching (LRU memory + SQLite persistence)
  • ✅ Cost tracking with budget management and circuit breaker
  • ✅ Smart client with automatic fallback and cost optimization

CLI & Reporting:

  • ✅ CLI commands: iris visual-diff and iris a11y
  • ✅ Multi-format reporting (HTML, JSON, JUnit, Markdown)
  • ✅ Visual reporter with diff viewer and interactive HTML reports

Accessibility Testing:

  • ✅ WCAG 2.1 Level AA/AAA compliance validation with axe-core
  • ✅ Keyboard navigation testing (Tab order, focus traps, arrow keys)
  • ✅ Screen reader simulation (ARIA labels, landmarks, headings)

Examples & Documentation:

  • ✅ 4 example projects (basic visual, multi-device, accessibility, CI/CD)
  • ✅ Comprehensive API documentation and user guides
  • ✅ CI/CD integration examples

Test Results: 541/564 tests passing (95.9% pass rate)

  • 1 non-critical performance test timing failure (easily fixable)
  • 22 accessibility E2E tests skipped due to infrastructure mismatch

Coverage: 75.49% overall (below 85% target)

  • Visual module: 88.3%
  • Accessibility module: 76.6%
  • Database: 95.74%
  • Branch coverage: 58.28% (primary improvement area)

Production Readiness: ✅ Ready for use with noted optimization opportunities


Quick Start

Installation

git clone https://github.com/frankbria/iris.git
cd iris
npm install
npm run build
npm link

Verify Installation

iris --version

Try the Demo (Fastest Way)

bash <(curl -s https://raw.githubusercontent.com/frankbria/iris/main/scripts/demo-setup.sh)

This creates a sample project, runs visual and accessibility tests, and generates reports automatically.

Basic Usage

Natural Language Commands:

# Execute browser actions with natural language
iris run "click #submit-button"
iris run "fill #email with [email protected]"
iris run "navigate to https://example.com"

# AI-powered complex commands (requires API key)
export OPENAI_API_KEY=sk-your-key
iris run "find the blue button next to the search box and click it"

Visual Regression Testing:

# Compare current page against baseline
iris visual-diff \
  --pages "http://localhost:8080/**/*.html" \
  --baseline main \
  --devices desktop,tablet,mobile \
  --threshold 0.1 \
  --format html

# Enable AI semantic analysis
iris visual-diff \
  --pages "http://localhost:8080/" \
  --semantic \
  --threshold 0.1

Accessibility Testing:

# Run WCAG 2.1 AA compliance tests
iris a11y \
  --pages "http://localhost:8080/**/*.html" \
  --tags wcag2a,wcag2aa \
  --include-keyboard \
  --format html

# Test with screen reader simulation
iris a11y \
  --pages "http://localhost:8080/" \
  --include-screenreader \
  --fail-on critical,serious

File Watching:

# Watch files and auto-execute on changes
iris watch src/ --instruction "reload page"
iris watch "**/*.ts" --execute

JSON-RPC Server:

# Start WebSocket server for AI coding assistant integration
iris connect
iris connect 8080  # Custom port

Configuration

AI Provider Setup

OpenAI (Recommended for Visual Analysis):

export OPENAI_API_KEY=sk-your-key

Anthropic Claude (Recommended for Semantic Analysis):

export ANTHROPIC_API_KEY=sk-ant-your-key

Local Ollama (Privacy-Focused):

export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llava:latest

Config File

Create ~/.iris/config.json:

{
  "ai": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  },
  "visual": {
    "threshold": 0.1,
    "devices": ["desktop"],
    "aiProvider": "openai"
  },
  "accessibility": {
    "wcagLevel": "AA",
    "includeKeyboard": true
  },
  "watch": {
    "patterns": ["**/*.{ts,tsx,js,jsx}"],
    "debounceMs": 1000
  }
}

Project-Level Config

Create .irisrc in your project root:

{
  "visual": {
    "threshold": 0.1,
    "devices": ["desktop", "tablet", "mobile"],
    "capture": {
      "waitForFonts": true,
      "disableAnimations": true,
      "stabilizationDelay": 500
    }
  },
  "accessibility": {
    "wcagLevel": "AA",
    "includeKeyboard": true
  }
}

Visual Regression Testing

Features

Capture Engine:

  • Screenshot capture with viewport/fullPage modes
  • Multi-device support (desktop 1920x1080, tablet 768x1024, mobile 375x667)
  • Page stabilization (fonts, animations, network idle)
  • Dynamic content masking
  • Element-specific capture

Diff Engine:

  • Pixel-level comparison with pixelmatch
  • SSIM (Structural Similarity Index) analysis
  • Region-based difference detection
  • Change classification (layout/content/styling/animation)

AI Semantic Analysis:

  • OpenAI GPT-4 Vision integration
  • Anthropic Claude 3.5 Sonnet support
  • Ollama local model support
  • Semantic change understanding (intentional vs regression)
  • Severity classification (breaking, moderate, minor)
  • Confidence scoring and explanations

Baseline Management:

  • Git-integrated baseline storage
  • Branch-based baseline strategies
  • Commit-based snapshots
  • Timestamp-based baselines
  • Automatic cleanup of old baselines

Reporting:

  • Interactive HTML reports with diff viewer
  • JSON structured data export
  • JUnit XML for CI/CD integration
  • Markdown summary reports

CLI Options

iris visual-diff [options]

Options:
  --pages <patterns>       Page patterns (comma-separated, default: /)
  --baseline <reference>   Baseline branch/commit (default: main)
  --semantic              Enable AI semantic analysis
  --threshold <value>     Pixel threshold 0-1 (default: 0.1)
  --devices <list>        Devices: desktop,tablet,mobile (default: desktop)
  --format <type>         Output: html|json|junit|markdown (default: html)
  --output <path>         Output file path
  --fail-on <severity>    Fail on: minor|moderate|breaking (default: breaking)
  --update-baseline       Update baseline with current screenshots
  --mask <selectors>      CSS selectors to mask (comma-separated)
  --concurrency <n>       Max concurrent comparisons (default: 3)

Accessibility Testing

Features

WCAG Compliance:

  • WCAG 2.0/2.1 Level A, AA, AAA validation
  • axe-core integration with 90+ rules
  • Configurable rule sets and tags
  • Impact-based severity classification

Keyboard Navigation:

  • Tab order validation
  • Focus trap detection
  • Arrow key navigation testing
  • Escape key handling verification
  • Custom keyboard sequence testing

Screen Reader Support:

  • ARIA label validation
  • Landmark navigation testing
  • Heading structure verification
  • Image alt text validation
  • Screen reader simulation

Reporting:

  • Accessibility score (0-100 scale)
  • Violation breakdown by severity
  • Element-level issue reporting
  • Remediation suggestions

CLI Options

iris a11y [options]

Options:
  --pages <patterns>        Page patterns (comma-separated, default: /)
  --rules <rules>           Specific axe rules (comma-separated)
  --tags <tags>             Rule tags: wcag2a,wcag2aa,wcag21aa (default: wcag2a,wcag2aa)
  --fail-on <impacts>       Impact levels: critical,serious,moderate,minor (default: critical,serious)
  --format <type>           Output: html|json|junit (default: html)
  --output <path>           Output file path
  --include-keyboard        Include keyboard navigation tests (default: true)
  --include-screenreader    Include screen reader simulation

Examples

Pre-built examples are available in the examples/ directory:

1. Basic Visual Testing

cd examples/basic-visual-test
./test-visual.sh

Demonstrates:

  • Simple page comparison
  • Baseline creation and updating
  • Threshold configuration
  • HTML report generation

2. Multi-Device Testing

cd examples/multi-device-visual
./test-responsive.sh

Demonstrates:

  • Desktop, tablet, mobile testing
  • Responsive design validation
  • Device-specific baselines
  • Parallel test execution

3. Accessibility Audit

cd examples/accessibility-audit
./test-a11y.sh

Demonstrates:

  • WCAG 2.1 AA compliance testing
  • Keyboard navigation validation
  • Screen reader simulation
  • Accessibility score reporting

4. CI/CD Integration

cd examples/ci-cd-integration

Includes configurations for:

  • GitHub Actions
  • GitLab CI
  • Jenkins
  • CircleCI

Development

Run Tests

npm test
# Result: 541/564 passing (95.9% pass rate)
# 1 failing (performance timing - non-critical)
# 22 skipped (accessibility E2E - infrastructure mismatch)

Build

npm run build

Coverage

npm test -- --coverage
# Overall: 75.49% (below 85% target)
# Visual: 88.3% | A11y: 76.6% | Database: 95.74%
# Branch coverage: 58.28% (primary improvement area)

Run Benchmarks

npm run bench

Performance baselines:

  • Single page visual diff: 42.61ms (target <100ms) ✅
  • 4K image processing: 205.30ms (target <300ms) ✅
  • Memory delta: 1.57MB ✅

Architecture

Phase 1 Core (9 modules, 25,667+ lines)

CLI Framework (src/cli.ts)

  • Commander.js-based CLI with 5 commands
  • Browser execution integration
  • Configuration management

Browser Automation (src/browser.ts, src/executor.ts)

  • Playwright wrapper with retry logic
  • Action execution with error handling
  • Session management

AI Translation (src/translator.ts, src/ai-client.ts)

  • Pattern matching + AI fallback
  • Multi-provider support (OpenAI/Anthropic/Ollama)
  • Confidence scoring

Protocol & Storage (src/protocol.ts, src/db.ts)

  • JSON-RPC 2.0 over WebSocket
  • SQLite persistence with migration system
  • Test result tracking with visual and a11y results

Phase 2 Visual & Accessibility (100% Complete)

Visual Module (src/visual/)

  • visual-runner.ts - Test orchestration (15,365 bytes)
  • capture.ts - Screenshot capture with stabilization
  • diff.ts - Pixel and SSIM comparison
  • baseline.ts - Git-integrated baseline management
  • ai-classifier.ts - AI semantic analysis (6,843 bytes)
  • reporter.ts - Multi-format reporting (979 lines)
  • storage.ts - Artifact storage

Accessibility Module (src/a11y/)

  • a11y-runner.ts - Test orchestration (12,799 bytes)
  • axe-integration.ts - WCAG compliance (6,279 bytes)
  • keyboard-tester.ts - Keyboard navigation (12,271 bytes)

Database (src/db.ts)

  • Extended schema with visual_test_results and a11y_test_results tables
  • Migration system for schema versioning
  • Aggregate statistics and query functions

Documentation

Getting Started

API Reference

Guides

Development

Contributing

AI Agents

Issue Tracking with Beads

IRIS uses Beads (bd) - a dependency-aware issue tracker designed for AI-supervised workflows. Issues are tracked with explicit dependency chains, making it easy for AI agents to find ready work and avoid duplicating effort.

Quick Start:

# Show unblocked issues ready to work on
bd ready

# View issue details
bd show iris-7

# Claim work
bd update iris-7 --status in_progress --assignee your-name

# Close when complete
bd close iris-7 --reason "commit abc123"

Current Status:

  • 19 issues tracking Phase 2 Sub-Phases B-E (weeks 5-18)
  • 10 issues ready with no blockers
  • Critical path: iris-6 → iris-7 (P0 validation) → iris-8 → ... → iris-16

Key Features:

  • Dependency tracking (blocks, parent-child, discovered-from)
  • Auto-sync with git (JSONL export/import)
  • Priority-based work queues (P0-P3)
  • JSON output for programmatic access

See docs/beads-migration-guide.md for complete workflow documentation.


Roadmap

Phase 1 ✅ (Complete - September 2024)

  • CLI framework with natural language commands
  • Browser automation with Playwright
  • File watching and auto-execution
  • AI translation with multi-provider support
  • JSON-RPC protocol server
  • SQLite persistence

Phase 2 ✅ (COMPLETE - October 2025)

  • ✅ Visual regression testing with pixel and SSIM comparison
  • ✅ AI semantic analysis (OpenAI, Claude, Ollama)
  • ✅ AI vision foundation with cost control and caching
  • ✅ Multi-device testing (desktop, tablet, mobile)
  • ✅ Accessibility validation (WCAG 2.1 AA/AAA)
  • ✅ Keyboard navigation and screen reader testing
  • ✅ Git-integrated baseline management
  • ✅ Multi-format reporting (HTML, JSON, JUnit, Markdown)
  • ✅ CLI integration (iris visual-diff, iris a11y)
  • ✅ E2E integration tests
  • ✅ Performance benchmarks
  • ✅ Comprehensive documentation and examples
  • ✅ CI/CD ready
  • ✅ Test suite stabilized (95.9% pass rate)
  • ⚠️ Coverage at 75.49% (below 85% target - branch coverage improvement needed)

Phase 3 📋 (Planned - Q1 2026)

  • Performance monitoring and Core Web Vitals
  • Advanced AI-powered visual analysis
  • Autonomous UI exploration
  • Design system compliance checking
  • Visual regression history and trends
  • Team collaboration features

Testing

Test Coverage:

  • Total: 564 tests (541 passing, 95.9% pass rate)
  • Failing: 1 (non-critical performance timing test)
  • Skipped: 22 (accessibility E2E infrastructure mismatch)
  • Overall coverage: 75.49% (target: 85%)
    • Visual module: 88.3%
    • Accessibility module: 76.6%
    • Database: 95.74%
    • Branch coverage: 58.28% (primary improvement opportunity)

Test Suites:

  • Unit tests for all core modules (541 passing)
  • Integration tests for CLI commands
  • E2E tests: Visual (93.3% passing), Accessibility (0% - skipped)
  • Browser automation tests with real Playwright
  • Performance benchmarks

Dependencies

Core:

  • Node.js >=18.0.0
  • TypeScript 5.1.6
  • Playwright 1.35.0
  • Commander 11.0.0

Visual Testing:

  • sharp (image processing)
  • pixelmatch (pixel diff)
  • image-ssim (structural similarity)
  • simple-git (baseline management)
  • openai (GPT-4 Vision)
  • @anthropic-ai/sdk (Claude)

Accessibility:

  • @axe-core/playwright
  • pa11y

Utilities:

  • zod (runtime validation)
  • better-sqlite3 (database)
  • ws (WebSocket)

Performance

Benchmarks (October 2025):

  • Single page visual diff: 42.61ms (target <100ms) ✅ 57% better
  • 4K image processing: 205.30ms (target <300ms) ✅ 32% better
  • Memory usage: 1.57MB delta ✅ Excellent
  • Parallel efficiency: 1.6x (roadmap for 3-5x improvement)

See docs/PERFORMANCE.md for detailed benchmarks.


CI/CD Integration

IRIS is CI/CD ready with:

  • Exit code propagation for pass/fail
  • JUnit XML report generation
  • JSON structured output
  • Parallel test execution
  • Configurable failure thresholds

Example GitHub Actions:

- name: Visual Regression Testing
  run: |
    iris visual-diff \
      --pages "http://localhost:8080/**/*.html" \
      --baseline main \
      --format junit \
      --output test-results/visual.xml

- name: Accessibility Testing
  run: |
    iris a11y \
      --pages "http://localhost:8080/**/*.html" \
      --format junit \
      --output test-results/a11y.xml

See docs/guides/ci-cd-integration.md for complete examples.


Contributing

Phase 2 is complete. The project is ready for Phase 3 development or community contributions.

Areas for Contribution:

  • Additional AI provider integrations
  • Enhanced report visualizations
  • Performance optimizations
  • Additional accessibility rules
  • Documentation improvements
  • Example projects

See DEVELOPMENT_INSTRUCTIONS.md for contribution guidelines.


License

MIT


Links

Building in public. Star the repo to follow along! ⭐


Quick Reference

Installation:

npm install -g @frankbria/iris  # Coming soon to npm
# Or install from source:
git clone https://github.com/frankbria/iris.git && cd iris && npm install && npm run build && npm link

Visual Testing:

iris visual-diff --pages "http://localhost:8080/" --semantic

Accessibility Testing:

iris a11y --pages "http://localhost:8080/" --include-keyboard

Get Help:

iris --help
iris visual-diff --help
iris a11y --help

Documentation:

Status:

  • Phase 1: ✅ Complete
  • Phase 2: ✅ Complete (production-ready)
  • Tests: 541/564 passing (95.9%)
  • Coverage: 75.49% (below 85% target)
  • Production Ready: ✅ Yes (with noted optimization opportunities)