Skip to content

LFX Phase 2: Parameter taxonomy and LLM prompt architecture#1766

Open
ishaan-arora-1 wants to merge 4 commits intoriscv:mainfrom
ishaan-arora-1:lfx-phase2-prompt-design
Open

LFX Phase 2: Parameter taxonomy and LLM prompt architecture#1766
ishaan-arora-1 wants to merge 4 commits intoriscv:mainfrom
ishaan-arora-1:lfx-phase2-prompt-design

Conversation

@ishaan-arora-1
Copy link
Copy Markdown
Contributor

Summary

Design and implement the formal parameter classification taxonomy and LLM prompt architecture for extracting architectural parameters from the RISC-V specification. Builds on Phase 1 (#1765).

  • Formal taxonomy (taxonomy.md): 8 parameter classes with precise definitions, disambiguation rules, and a decision tree
  • System prompt (system_prompt.txt): ~940 token prompt defining role, task, condensed taxonomy, critical rules, and strict JSON output schema
  • Few-shot examples (examples.json): 6 positive + 4 negative examples from real spec text covering all normative classification classes and key false-positive patterns
  • Prompt assembler (run_prompt.py): CLI tool with 3 modes — assemble, chunk, and estimate — for building context-aware prompts across different LLM models
  • Validation suite (validate_prompt.py): 175-check automated verification covering taxonomy completeness, example accuracy, schema consistency, assembly correctness, and chunking integrity

Key Design Decisions

Decision Rationale
Single-pass extraction + classification Preserves context for classification; avoids two-pass token cost
Mandatory reasoning field in output Reduces hallucinations and aids human review
skipped_non_parameters in output Forces LLM to demonstrate understanding of boundaries
Section-boundary-aware chunking Prevents splitting mid-paragraph; configurable overlap for context continuity
Three-layer prompt (system + examples + chunk) Each layer has clear token budget; examples/param-names can be toggled off for small-context models

Parameter Classes

Class Count in Phase 1 Description
NORM_DIRECT 102 Direct implementation choice, not CSR-controlled
NORM_CSR_RW 55 Controls whether a CSR field is RO/RW
NORM_CSR_WARL 26 Legal values of a WARL CSR field
SW_RULE 2 Deterministic with correct software
NON_ISA Platform-level, outside ISA scope
NON_NORM Inside NOTE/TIP/WARNING blocks
DOC_RULE Documentation requirements
UNKNOWN Needs further analysis

Token Budget

System prompt:           940 tokens
Few-shot examples:     1,691 tokens
UDB param names:       1,401 tokens
System overhead:         200 tokens
Reserved for output:   4,096 tokens
────────────────────────────────────
Fixed overhead:        4,232 tokens

Available for spec chunk:
  gpt-4o/gpt-4-turbo:   ~119K tokens
  claude-3.5-sonnet:     ~191K tokens
  gemini-1.5-pro:        ~991K tokens

How to Run

# Estimate token budgets
python3 param_extraction/scripts/run_prompt.py estimate

# Chunk a spec file
python3 param_extraction/scripts/run_prompt.py chunk \
    ext/riscv-isa-manual/src/machine.adoc --max-tokens 40000

# Assemble a prompt for a specific chunk
python3 param_extraction/scripts/run_prompt.py assemble \
    ext/riscv-isa-manual/src/machine.adoc \
    --start-line 1209 --end-line 1270 --output-json

# Run validation suite
cd param_extraction/scripts && python3 validate_prompt.py

Test Plan

  • validate_prompt.py passes 175/175 checks (0 failures)
  • All 8 parameter classes defined consistently across taxonomy, system prompt, and examples
  • All 6 value types defined consistently across taxonomy, system prompt, and examples
  • Decision tree ordering in taxonomy matches system prompt ordering
  • All example UDB parameter names verified in ground_truth.json
  • All example classifications match Phase 1 classifications
  • All example line numbers verified against actual spec files
  • Example output schema fields match system prompt schema
  • All 74 spec files chunk successfully with no gaps
  • Chunking handles edge cases: empty files, no headers, very small chunk limits
  • Context overflow correctly raises ValueError for small-context models
  • Examples/param-names correctly omit when disabled
  • No unused imports, no debug artifacts

Closes #1748

…tion

Add scripts and data for cataloging all 185 UDB architectural parameters
with schema analysis, CSR cross-references, heuristic classifications,
and candidate spec text locations. This forms the foundation for
LLM-based parameter extraction from the RISC-V specification.

Scripts:
- export_udb_params.py: extracts parameters from YAML, derives value
  types, cross-references CSR IDL, classifies each parameter
- map_params_to_spec.py: searches 74 spec .adoc files for text related
  to each parameter using multi-strategy keyword matching
- generate_report.py: produces CSV catalog, text report, and param
  name list

Key results:
- 185 parameters cataloged (102 NORM_DIRECT, 55 NORM_CSR_RW,
  26 NORM_CSR_WARL, 2 SW_RULE)
- 81% high-confidence classifications
- 98% of parameters mapped to spec text candidates

Closes riscv#1747
Design and implement the formal parameter classification taxonomy and
prompt architecture for LLM-based extraction from RISC-V specifications.

Deliverables:
- taxonomy.md: formal definitions for 8 parameter classes (NORM_DIRECT,
  NORM_CSR_WARL, NORM_CSR_RW, SW_RULE, NON_ISA, NON_NORM, DOC_RULE,
  UNKNOWN) with disambiguation rules and a decision tree
- system_prompt.txt: ~940 token system prompt defining role, task,
  taxonomy, critical rules, and JSON output schema
- examples.json: 6 positive + 4 negative few-shot examples from real
  spec text covering all normative classes and key false-positive
  patterns (NOTE blocks, CSR behavior, fixed requirements, permission
  vs optionality "may")
- run_prompt.py: prompt assembler with 3 CLI modes (assemble, chunk,
  estimate) supporting context window management across models
- validate_prompt.py: 175-check validation suite for all deliverables

Key design decisions:
- Single-pass extraction + classification to preserve context
- Mandatory reasoning field in LLM output to reduce hallucinations
- Section-boundary-aware chunking with configurable overlap
- Three-layer prompt: system + examples + param names + spec chunk

Closes riscv#1748
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.96%. Comparing base (de41e7b) to head (ab1a22b).
⚠️ Report is 35 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1766   +/-   ##
=======================================
  Coverage   71.96%   71.96%           
=======================================
  Files          54       54           
  Lines       27976    27976           
  Branches     6183     6183           
=======================================
  Hits        20132    20132           
  Misses       7844     7844           
Flag Coverage Δ
idlc 75.90% <ø> (ø)
udb 65.84% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add REUSE annotation for param_extraction/** in REUSE.toml
- Fix ruff lint errors: remove unused variables, prefix unused loop
  vars with underscore, remove extraneous f-string prefixes, remove
  unused imports, sort import blocks
- Apply ruff formatting to all Python scripts
- Make Python scripts executable to satisfy EXE001 shebang check
- Fix prettier formatting for ground_truth.json and spec_mappings.json
- Strip trailing whitespace from parameters_catalog.csv
- Add missing end-of-file newline to phase1_report.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LFX - Phase 2: Design Parameter Taxonomy & LLM Prompts

1 participant