Conversation
Add the following skills to the skills directory: - auto-arena: automatically evaluate and compare multiple AI models - bib-verify: verify BibTeX references against CrossRef/arXiv/DBLP - claude-authenticity: detect genuine Claude API endpoints - find-skills-combo: discover optimal skill combinations for complex tasks - openjudge: build custom LLM evaluation pipelines with OpenJudge - paper-review: review academic papers using OpenJudge multi-stage pipeline - ref-hallucination-arena: benchmark LLM reference hallucination rates Made-with: Cursor
|
@qbc2016 Plz take a look at this PR |
There was a problem hiding this comment.
Pull request overview
This PR adds seven new skills (and supporting reference docs) to the skills/ directory, primarily focused on OpenJudge-based evaluation pipelines, academic paper review / BibTeX verification, model arena benchmarking, and Claude endpoint authenticity checking.
Changes:
- Added new skills:
auto-arena,bib-verify,claude-authenticity,find-skills-combo,openjudge,paper-review,ref-hallucination-arena. - Added OpenJudge reference sub-docs (graders/pipeline/generator/analyzer) and paper-review reference documentation.
- Documented CLI/Python usage patterns and configuration templates for these workflows.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/auto-arena/SKILL.md | Documentation for an OpenJudge “auto arena” benchmarking pipeline. |
| skills/bib-verify/SKILL.md | Documentation for BibTeX verification (via paper-review pipeline in bib-only mode). |
| skills/claude-authenticity/SKILL.md | Documentation + self-contained script for Claude endpoint authenticity checks. |
| skills/find-skills-combo/SKILL.md | Documentation for a multi-skill discovery/recommendation workflow using npx skills. |
| skills/openjudge/SKILL.md | Top-level OpenJudge skill doc with pointers to sub-documents. |
| skills/openjudge/graders.md | Reference documentation for OpenJudge graders and usage patterns. |
| skills/openjudge/pipeline.md | Reference documentation for OpenJudge batch evaluation runner/aggregators/strategies. |
| skills/openjudge/generator.md | Reference documentation for OpenJudge rubric generators. |
| skills/openjudge/analyzer.md | Reference documentation for OpenJudge analyzers (pairwise/statistical/validation). |
| skills/paper-review/SKILL.md | Documentation for multi-stage paper review pipeline (PDF/TeX/BibTeX). |
| skills/paper-review/reference.md | Reference documentation for PaperReviewPipeline config + CLI flags. |
| skills/ref-hallucination-arena/SKILL.md | Documentation for reference hallucination benchmarking pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| All file types use a single entry point. File type is auto-detected. | ||
|
|
||
| ```bash | ||
| python -m cookbooks.paper_review [--input FILE] [options] | ||
| ``` | ||
|
|
||
| | Flag | Default | Description | | ||
| |------|---------|-------------| | ||
| | `--input` | — | Path to PDF, .tar.gz/.zip, or .bib file | | ||
| | `--bib_only` | — | Path to .bib file for standalone BibTeX-only verification | |
There was a problem hiding this comment.
This reference describes usage via --input FILE, while the main SKILL.md examples use a positional paper.pdf argument. Please document a single canonical invocation (or explicitly state that both positional input and --input are supported) to prevent copy/paste CLI errors.
| ## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify)) | ||
|
|
There was a problem hiding this comment.
This skill explicitly says the checks “mirror claude-verify”. If any of the script logic/text is copied or closely adapted from that repository, please ensure the original project’s license is compatible with this repo and add the required attribution/license notice (e.g., in the doc or alongside the extracted script) to avoid downstream distribution issues.
| ## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify)) | |
| ## The 9 checks | |
| > Attribution: The design and weighting of these nine checks are adapted from the open-source | |
| > [claude-verify](https://github.com/molloryn/claude-verify) project, which is licensed under | |
| > the MIT License by its original author(s). This skill reimplements that logic; any reused or | |
| > adapted portions are used under the terms of the MIT License. See the claude-verify repository | |
| > for the original implementation and full license text. |
|
|
||
| # Extra dependency for paper_review | ||
| pip install litellm | ||
| pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True) |
There was a problem hiding this comment.
The prerequisites say pypdfium2 is only needed if using vision mode, but later this doc says vision mode is enabled by default for PDF review. This is contradictory and can lead to missing-dependency failures. Clarify whether vision is default-on, and update the pypdfium2 note accordingly (required for default PDF review vs only when explicitly enabling vision).
| pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True) | |
| pip install pypdfium2 # required for default PDF review (vision mode is enabled by default) |
|
|
||
| ## Additional resources | ||
|
|
||
| - Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/) |
There was a problem hiding this comment.
This “Full config examples” link points to ../../cookbooks/..., but this repository doesn’t contain a cookbooks/ directory, so the relative link will be broken on GitHub. Consider linking to the upstream OpenJudge repository (or include the referenced examples in this repo) so readers can actually access them.
| - Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/) | |
| - Full config examples: [cookbooks/auto_arena/examples/](https://github.com/agentscope-ai/OpenJudge/tree/main/cookbooks/auto_arena/examples) |
| - Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/) | ||
| - Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md) |
There was a problem hiding this comment.
Both “Full config examples” and “Documentation” are relative links into ../../cookbooks/... and ../../docs/..., but those directories aren’t present in this repo, so the links will be broken on GitHub. Please either include the referenced content here or replace these with stable external URLs (e.g., the upstream OpenJudge repo/docs).
| - Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/) | |
| - Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md) | |
| - Full config examples: [cookbooks/ref_hallucination_arena/examples/](https://github.com/OpenJudge/open-judge/tree/main/cookbooks/ref_hallucination_arena/examples) | |
| - Documentation: [docs/validating_graders/ref_hallucination_arena.md](https://github.com/OpenJudge/open-judge/blob/main/docs/validating_graders/ref_hallucination_arena.md) |
| ```bash | ||
| npx skills add owner/repo@skill-a -g -y | ||
| npx skills add owner/repo@skill-b -g -y | ||
| npx skills add owner/repo@skill-c -g -y | ||
| ``` |
There was a problem hiding this comment.
The fenced code blocks under the Strategy A/Strategy B install sections contain a leading zero-width character before the triple backticks, which will prevent Markdown from rendering them as code fences in many viewers. Remove the hidden character so the fences start with plain ```.
| ### Install (M skills) | ||
| ```bash | ||
| npx skills add owner/repo@skill-a -g -y | ||
| ``` |
There was a problem hiding this comment.
Same as above: this install snippet has a zero-width character before the opening/closing triple backticks, which can break Markdown formatting. Ensure the code fence markers are exactly bash and with no hidden characters.
| **Vision mode is enabled by default for PDF review.** Pages are rendered as images, which | ||
| preserves formatting, figures, and tables. To disable, pass `--no_vision` (not recommended). | ||
| The model **must** support multimodal (vision) input. |
There was a problem hiding this comment.
This doc says to disable vision via --no_vision, but the options table doesn’t list a --no_vision flag and the reference.md indicates disabling via --vision=False. Please align on the actual CLI interface (either document --vision/--no-vision style, or the --vision=False convention) and make the disable instructions consistent across both files.
| | `venue` | str \| None | `None` | Target venue name, applied on top of discipline criteria | | ||
| | `instructions` | str \| None | `None` | Free-form reviewer guidance, e.g. "Focus on experimental design" | | ||
| | `language` | str \| None | `None` | Output language: `"en"` (default) or `"zh"` (Simplified Chinese) | | ||
| | `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) | |
There was a problem hiding this comment.
use_vision_for_pdf is documented as defaulting to False, but later in this reference the CLI flag --vision is shown with default True, and the main SKILL.md says vision is enabled by default for PDF review. These defaults should be consistent to avoid users configuring the pipeline incorrectly.
| | `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) | | |
| | `use_vision_for_pdf` | bool | `True` | Render PDF pages as images (needs `pypdfium2`) | |
| | Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` | | ||
| | DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` | |
There was a problem hiding this comment.
The model string examples here don’t match the model-format guidance in skills/paper-review/SKILL.md (which requires anthropic/ and dashscope/ prefixes for routing). Either the pipeline accepts both formats or one of these docs is wrong; please reconcile the expected LiteLLM model string format across both documents so users pass valid --model values.
| | Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` | | |
| | DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` | | |
| | Anthropic | `anthropic/claude-opus-4-5`, `anthropic/claude-sonnet-4-5`, `anthropic/claude-haiku-3-5` | `ANTHROPIC_API_KEY` | | |
| | DashScope / Qwen | `dashscope/qwen-plus`, `dashscope/qwen-max`, `dashscope/qwen-turbo` | `DASHSCOPE_API_KEY` | |
Add the following skills to the skills directory:
Made-with: Cursor