LFX - Phase 5: Analysis, Alignment & Metrics by ishaan-arora-1 · Pull Request #1792 · riscv/riscv-unified-db

ishaan-arora-1 · 2026-04-15T16:47:16Z

Summary

Add analyze.py — comprehensive analysis pipeline for evaluating LLM extraction results against UDB ground truth
Implements deduplication (handling cross-chunk duplicates with confidence-based selection), multi-strategy alignment (exact match, one-to-many mappings, concept groups, fuzzy name matching), and detailed metrics computation
Generates discrepancy reports categorizing differences as naming mismatches, class disagreements, recall misses, new discoveries, and hallucination suspects

V1 Evaluation Results

Metric	Value
UDB parameters	185
LLM params (deduped)	202
Adjusted recall	58.8%
Classification accuracy	77.8%
New params discovered	115

Key analysis capabilities

Deduplication: Cross-chunk parameter deduplication preferring in-content-region matches over overlap-region duplicates
One-to-many mappings: Handles cases where UDB has fine-grained per-extension parameters (e.g., MUTABLE_MISA_A/B/C/...) while LLM identifies a single conceptual parameter
Concept groups: Groups related UDB parameters (e.g., REPORT_VA_IN_*TVAL_ON_*) under shared concepts for recall calculation
Per-class recall: Breakdown of recall by parameter classification (NORM_DIRECT, NORM_CSR_WARL, etc.)
Confusion matrix: Classification accuracy analysis across all parameter classes

Test plan

Deduplication correctly resolves cross-chunk duplicates
Alignment matches UDB parameters via exact, fuzzy, and concept-group strategies
Metrics computation produces valid recall and accuracy figures
Discrepancy CSV categorizes all differences correctly
Pre-commit hooks pass (ruff, SPDX headers)

…tion Add scripts and data for cataloging all 185 UDB architectural parameters with schema analysis, CSR cross-references, heuristic classifications, and candidate spec text locations. This forms the foundation for LLM-based parameter extraction from the RISC-V specification. Scripts: - export_udb_params.py: extracts parameters from YAML, derives value types, cross-references CSR IDL, classifies each parameter - map_params_to_spec.py: searches 74 spec .adoc files for text related to each parameter using multi-strategy keyword matching - generate_report.py: produces CSV catalog, text report, and param name list Key results: - 185 parameters cataloged (102 NORM_DIRECT, 55 NORM_CSR_RW, 26 NORM_CSR_WARL, 2 SW_RULE) - 81% high-confidence classifications - 98% of parameters mapped to spec text candidates Closes riscv#1747

Design and implement the formal parameter classification taxonomy and prompt architecture for LLM-based extraction from RISC-V specifications. Deliverables: - taxonomy.md: formal definitions for 8 parameter classes (NORM_DIRECT, NORM_CSR_WARL, NORM_CSR_RW, SW_RULE, NON_ISA, NON_NORM, DOC_RULE, UNKNOWN) with disambiguation rules and a decision tree - system_prompt.txt: ~940 token system prompt defining role, task, taxonomy, critical rules, and JSON output schema - examples.json: 6 positive + 4 negative few-shot examples from real spec text covering all normative classes and key false-positive patterns (NOTE blocks, CSR behavior, fixed requirements, permission vs optionality "may") - run_prompt.py: prompt assembler with 3 CLI modes (assemble, chunk, estimate) supporting context window management across models - validate_prompt.py: 175-check validation suite for all deliverables Key design decisions: - Single-pass extraction + classification to preserve context - Mandatory reasoning field in LLM output to reduce hallucinations - Section-boundary-aware chunking with configurable overlap - Three-layer prompt: system + examples + param names + spec chunk Closes riscv#1748

Add a chunker that splits the 52,602-line RISC-V specification into 78 semantically coherent chunks across 74 .adoc files, preserving CSR section integrity for LLM parameter extraction. Key features: - Never splits within a ==== section (CSR descriptions stay atomic) - Splits at === or ==== AsciiDoc heading boundaries - Target chunk size: 2,500-3,500 lines (~35K-45K tokens) - Overlap context (30 lines) at chunk boundaries - Files under 2,000 lines stay as single chunks - Built-in verify command checks CSR integrity, coverage, and metadata Results: - 78 chunks across 74 files (4 files split into 2 chunks each) - 100% line coverage on all multi-chunk files - Zero CSR section splits - Full manifest.json with per-chunk metadata Closes riscv#1749

Add extract.py for automated parameter extraction using Anthropic Claude. Features include token-aware rate limiting, exponential backoff for API errors, source file skipping for non-parameter content, and pilot/run/merge CLI modes. Includes v1 extraction results across 59 spec chunks (208 unique parameters found).

Add analyze.py for deduplication, UDB alignment, metrics computation, and discrepancy reporting. V1 results: 58.8% adjusted recall, 77.8% classification accuracy, 202 unique parameters after deduplication. Identifies 73 UDB recall misses categorized as debug-spec (32) and recoverable (49) for prompt refinement targeting.

codecov · 2026-04-15T16:53:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.16%. Comparing base (ba151af) to head (f81e6b5).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1792      +/-   ##
==========================================
+ Coverage   71.95%   72.16%   +0.20%     
==========================================
  Files          55       55              
  Lines       28085    27799     -286     
  Branches     6172     6009     -163     
==========================================
- Hits        20209    20060     -149     
+ Misses       7876     7739     -137

Flag	Coverage Δ
idlc	`76.18% <ø> (+0.21%)`	⬆️
udb	`66.11% <ø> (+0.31%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ishaan-arora-1 added 5 commits April 9, 2026 15:19

ishaan-arora-1 requested review from ThinkOpenly and dhower-qc as code owners April 15, 2026 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFX - Phase 5: Analysis, Alignment & Metrics#1792

LFX - Phase 5: Analysis, Alignment & Metrics#1792
ishaan-arora-1 wants to merge 5 commits intoriscv:mainfrom
ishaan-arora-1:lfx-phase5-analysis

ishaan-arora-1 commented Apr 15, 2026

Uh oh!

codecov Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishaan-arora-1 commented Apr 15, 2026

Summary

V1 Evaluation Results

Key analysis capabilities

Test plan

Uh oh!

codecov Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Apr 15, 2026 •

edited

Loading