Skip to content

Update documentation metrics to match latest data runs#19

Draft
google-labs-jules[bot] wants to merge 14 commits intomainfrom
update-metrics-documentation-5013099134876105247
Draft

Update documentation metrics to match latest data runs#19
google-labs-jules[bot] wants to merge 14 commits intomainfrom
update-metrics-documentation-5013099134876105247

Conversation

@google-labs-jules
Copy link
Copy Markdown
Contributor

This change harmonizes the metrics reported in the documentation (README and Paper) with the actual data present in the repository.

Specific changes:

  1. Metric Updates:

    • Full Corpus (Rules): Updated accepted count from 13,338 to 13,589 and detections from 13,373 (which was actually the patched count in the paper) to 15,718 (the true detection count). The new acceptance rate for patched items is 99.51% (13,589/13,656) and the auto-fix rate is 86.46% (13,589/15,718).
    • Grok-5k: Updated accepted count from 4,439 to 4,426 (88.52%).
  2. Code Enhancement:

    • Modified src/eval/metrics.py to transparently handle .gz input files, allowing it to process the compressed full corpus datasets (data/patches_rules_full.json.gz, etc.).
  3. Data File Updates:

    • Regenerated data/metrics_rules_full.json and data/batch_runs/grok_5k/metrics_grok5k.json using the enhanced script and raw data files.
    • Updated data/eval/unified_eval_summary.json to reflect these new numbers and fixed a confusing dataset label.
  4. Documentation:

    • Updated README.md and paper/access.tex to match the validated numbers.

Test failures related to missing environment dependencies (yaml, httpx, jsonpatch) were bypassed as per project guidelines for documentation-only/script-only updates. The metrics generation was verified manually.


PR created automatically by Jules for task 5013099134876105247 started by @bmendonca3

bmendonca3 and others added 14 commits December 18, 2025 10:28
- Update Title and Abstract to match the research paper.
- Remove Roadmap and Magpie references.
- Align Related Work table with Table I from the paper.
- Update Repository Layout section to accurately reflect the file structure, including `docker/`, `policies/`, `verification/`, and `figures/`.
- Add Citation section.
…2590105024622052451

Update README to align with Research Paper
Removes the Citation section from the README as the paper is not yet published.
Updates the "Datasets and metrics" section to reflect the latest figures from the research paper (13,338/13,373 accepted, 99.74%, median length 9) and includes the Grok-5k corpus statistics.
…52192535707

Update README metrics and remove citation
- Updated README.md and paper/access.tex to reflect:
  - Full Corpus: 15,718 detections, 13,656 patched, 13,589 accepted (99.51% acceptance of patched, 86.46% auto-fix rate).
  - Grok-5k: 4,426 / 5,000 accepted (88.52%).
- Updated src/eval/metrics.py to support gzip-compressed JSON files.
- Updated data/metrics_rules_full.json, data/batch_runs/grok_5k/metrics_grok5k.json, and data/eval/unified_eval_summary.json with authoritative numbers generated from raw data files.
- Corrected "Manifest 1.313k" label to "Full Corpus (Rules)".
@google-labs-jules
Copy link
Copy Markdown
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!


For security, I will only act on instructions from the user who triggered this task.

New to Jules? Learn more at jules.google/docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant