Skip to content

autonomous-testing/simple-visual-model-tester

Repository files navigation

UI Element Locator — Multi‑Model LLM Visual Evaluator (SPA)

Browser‑only MVP to send one prompt + one image to multiple OpenAI‑compatible vision models in parallel, draw detections on a canvas, log per‑model traffic, persist Runs and Batches, and export CSV.

Quick Start

  1. Open app/index.html directly in a modern desktop browser or host the app/ folder as static files (e.g., GitHub Pages) and open the published URL.

  2. In Models tab, set Base URL, API Key, Endpoint Type (chat or responses) and Model (e.g., gpt-4o-mini). Click Save to Browser.

  3. Choose ImageLoad Selected Image, enter Prompt, set Iterations if needed, click Run on Enabled Models.

  4. Inspect overlay + logs. Use Results tab to Export CSV for This run / This batch / All runs.

CSV Columns

The export includes these columns (order is stable):

batchId, batchSeq, runId, runLabel, timestampIso, imageName, imageW, imageH, prompt, modelPrompt, modelDisplayName, baseURL, model, detectionType, x, y, width, height, confidence, latencyMs, status, error, rawTextShort, rawTextCanonical, rawTextFull

  • modelPrompt: the actual prompt sent to the model (for DINO, the DINO Prompt).
  • rawTextShort: first 200 characters of the canonical JSON used for overlay.
  • rawTextCanonical: full canonical JSON used for overlay/CSV.
  • rawTextFull: full raw server JSON (sanitized), when available.

Notes

  • CORS: The target API must allow browser requests from your origin (e.g., your GitHub Pages domain) for model calls to succeed.
  • Security: API keys live in the browser (localStorage). Do not ship this as‑is to untrusted users.
  • Persistence: Metadata in localStorage, payloads (runs + image blobs) in IndexedDB. History survives reload.
  • Response contract: The app instructs models to return JSON only with:
    {
      "coordinate_system": "pixel",
      "origin": "top-left",
      "image_size": { "width": 123, "height": 456 },
      "primary": { "type": "point", "x": 10, "y": 20, "confidence": 0.9 },
      "others": [],
      "notes": "optional"
    }

GroundingDINO Compatibility

  • You can configure a model with Endpoint Type: GroundingDINO and set the Base URL of an external detection endpoint that allows browser requests (CORS). The client adapts common response shapes into the app’s canonical JSON for overlay/CSV.
  • Adapter rules: If all of x,y,width,height are ≤ 1 they are treated as normalized fractions; otherwise as percentages (0–100). Tiny positive boxes are preserved using floor/ceil edges; zero‑area boxes become point candidates.

Browser Support

  • Designed for evergreen browsers (ES modules). Uses createImageBitmap({ imageOrientation: 'from-image' }) to respect EXIF rotation when supported, with a safe fallback.

GitHub Pages

Enable in GitHub: Settings → Pages → Build and deployment → Source: "Deploy from a branch" → Branch: main → Folder: /app → Save.

Folder Layout

app/
  index.html
  styles.css
  .nojekyll
  src/
    main.js
    components/
      image-loader.js
      model-tabs.js
      overlay-renderer.js
      results-table.js
      history-dropdown.js
      history-dialog.js
    core/
      api-client.js
      parser.js
      metrics.js
      storage.js
      idb.js
      history-store.js
      batch-runner.js
      logger.js
      utils.js

Acceptance Criteria Coverage

  • Image Load & Normalize: canvas upright via createImageBitmap(..., { imageOrientation: 'from-image' }).
  • Parallel Call & Overlay: BatchRunner fires all enabled models in parallel per run; overlay draws points/bboxes with labels.
  • Per‑Model Logs: stored per model in RunData.logs[modelId] with sanitized headers and timings.
  • CSV Export: Results tab exports CSV for run/batch/all with exact columns.
  • History: dropdown + dialog load past runs; images come from IDB.
  • Batch: sequential iterations, cancel via Cancel Batch (aborts next runs).
  • Persistence: runs/batches in localStorage + IDB, reloaded after refresh.

Known Limits (MVP)

  • No pan/zoom. No ground truth scoring.
  • Some OpenAI‑compatible servers use different response shapes; client extracts .choices[0].message.content or .output_text when possible, else stringifies JSON.
  • If IndexedDB is blocked, images fall back to stored data URLs only when you modify code (out of scope here).

© You. MIT‑style licensing recommended.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published