Update documentation metrics to match latest data runs#19
Update documentation metrics to match latest data runs#19google-labs-jules[bot] wants to merge 14 commits intomainfrom
Conversation
- Update Title and Abstract to match the research paper. - Remove Roadmap and Magpie references. - Align Related Work table with Table I from the paper. - Update Repository Layout section to accurately reflect the file structure, including `docker/`, `policies/`, `verification/`, and `figures/`. - Add Citation section.
…2590105024622052451 Update README to align with Research Paper
Removes the Citation section from the README as the paper is not yet published. Updates the "Datasets and metrics" section to reflect the latest figures from the research paper (13,338/13,373 accepted, 99.74%, median length 9) and includes the Grok-5k corpus statistics.
…52192535707 Update README metrics and remove citation
- Updated README.md and paper/access.tex to reflect: - Full Corpus: 15,718 detections, 13,656 patched, 13,589 accepted (99.51% acceptance of patched, 86.46% auto-fix rate). - Grok-5k: 4,426 / 5,000 accepted (88.52%). - Updated src/eval/metrics.py to support gzip-compressed JSON files. - Updated data/metrics_rules_full.json, data/batch_runs/grok_5k/metrics_grok5k.json, and data/eval/unified_eval_summary.json with authoritative numbers generated from raw data files. - Corrected "Manifest 1.313k" label to "Full Corpus (Rules)".
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with For security, I will only act on instructions from the user who triggered this task. New to Jules? Learn more at jules.google/docs. |
This change harmonizes the metrics reported in the documentation (README and Paper) with the actual data present in the repository.
Specific changes:
Metric Updates:
Code Enhancement:
src/eval/metrics.pyto transparently handle.gzinput files, allowing it to process the compressed full corpus datasets (data/patches_rules_full.json.gz, etc.).Data File Updates:
data/metrics_rules_full.jsonanddata/batch_runs/grok_5k/metrics_grok5k.jsonusing the enhanced script and raw data files.data/eval/unified_eval_summary.jsonto reflect these new numbers and fixed a confusing dataset label.Documentation:
README.mdandpaper/access.texto match the validated numbers.Test failures related to missing environment dependencies (
yaml,httpx,jsonpatch) were bypassed as per project guidelines for documentation-only/script-only updates. The metrics generation was verified manually.PR created automatically by Jules for task 5013099134876105247 started by @bmendonca3