Replace eval() in LooGLE evaluation parsing by fallintoplace · Pull Request #248 · NVIDIA/kvpress

fallintoplace · 2026-07-02T17:49:05Z

Summary

replace eval() in the LooGLE cloze metrics with safe JSON/literal parsing
reuse the same safe parser for qa_pairs when building the LooGLE Hugging Face dataset
add regression tests for valid Python-style literals and non-literal payloads

Root cause

The LooGLE benchmark parsed both dataset fields and generated answers with eval(). For shortdep_cloze, that meant a crafted prediction string could execute Python during metric calculation.

Validation

uv run --python 3.11 pytest tests/test_loogle_parsing.py
uv run --python 3.11 flake8 evaluation/benchmarks/loogle/calculate_metrics.py evaluation/benchmarks/loogle/create_huggingface_dataset.py evaluation/benchmarks/loogle/parsing.py tests/test_loogle_parsing.py
uv run --python 3.11 mypy evaluation/benchmarks/loogle/calculate_metrics.py evaluation/benchmarks/loogle/create_huggingface_dataset.py evaluation/benchmarks/loogle/parsing.py tests/test_loogle_parsing.py --check-untyped-defs
uv run --python 3.11 python -c "import evaluation.benchmarks.loogle.create_huggingface_dataset as mod; print("import-ok", hasattr(mod, "build_task_dataframe"), hasattr(mod, "main"))"

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>

copy-pr-bot · 2026-07-02T17:49:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Use safe parsing for LooGLE evaluation

090d74b

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace eval() in LooGLE evaluation parsing#248

Replace eval() in LooGLE evaluation parsing#248
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/loogle-safe-parsing

fallintoplace commented Jul 2, 2026

Uh oh!

copy-pr-bot Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fallintoplace commented Jul 2, 2026

Summary

Root cause

Validation

Uh oh!

copy-pr-bot Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant