LFX Phase 1: Ground truth map for architectural parameter extraction#1765
Open
ishaan-arora-1 wants to merge 3 commits intoriscv:mainfrom
Open
LFX Phase 1: Ground truth map for architectural parameter extraction#1765ishaan-arora-1 wants to merge 3 commits intoriscv:mainfrom
ishaan-arora-1 wants to merge 3 commits intoriscv:mainfrom
Conversation
…tion Add scripts and data for cataloging all 185 UDB architectural parameters with schema analysis, CSR cross-references, heuristic classifications, and candidate spec text locations. This forms the foundation for LLM-based parameter extraction from the RISC-V specification. Scripts: - export_udb_params.py: extracts parameters from YAML, derives value types, cross-references CSR IDL, classifies each parameter - map_params_to_spec.py: searches 74 spec .adoc files for text related to each parameter using multi-strategy keyword matching - generate_report.py: produces CSV catalog, text report, and param name list Key results: - 185 parameters cataloged (102 NORM_DIRECT, 55 NORM_CSR_RW, 26 NORM_CSR_WARL, 2 SW_RULE) - 81% high-confidence classifications - 98% of parameters mapped to spec text candidates Closes riscv#1747
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1765 +/- ##
==========================================
- Coverage 71.96% 71.95% -0.01%
==========================================
Files 54 54
Lines 27976 27976
Branches 6183 6183
==========================================
- Hits 20132 20131 -1
- Misses 7844 7845 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
13 tasks
- Add REUSE annotation for param_extraction/** in REUSE.toml - Fix ruff lint errors: remove unused variables, prefix unused loop vars with underscore, remove extraneous f-string prefixes, sort import blocks - Apply ruff formatting to all Python scripts - Make Python scripts executable to satisfy EXE001 shebang check - Fix prettier formatting for ground_truth.json and spec_mappings.json - Strip trailing whitespace from parameters_catalog.csv - Add missing end-of-file newline to phase1_report.txt
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What's included
Scripts (
param_extraction/scripts/)export_udb_params.pyspec/std/isa/param/*.yamlfiles (excluding 22 MOCK fixtures), analyzes JSON Schema structure, cross-references CSR IDL code forsw_write()/type()/reset_value()references, and classifies each parametermap_params_to_spec.py.adocfiles (52,602 lines) for text related to each parameter using multi-strategy keyword matching (exact name, CSR backtick refs, description keywords, WARL proximity patterns)generate_report.pyData outputs (
param_extraction/data/)ground_truth.jsonspec_mappings.jsonparameters_catalog.csvphase1_report.txtudb_param_names.txtKey results
How to run
Test plan
Closes #1747