Skip to content

feat: add 7 new agent skills#2

Open
XiaoBoAI wants to merge 1 commit intomainfrom
feat/add-7-skills
Open

feat: add 7 new agent skills#2
XiaoBoAI wants to merge 1 commit intomainfrom
feat/add-7-skills

Conversation

@XiaoBoAI
Copy link
Collaborator

Add the following skills to the skills directory:

  • auto-arena: automatically evaluate and compare multiple AI models
  • bib-verify: verify BibTeX references against CrossRef/arXiv/DBLP
  • claude-authenticity: detect genuine Claude API endpoints
  • find-skills-combo: discover optimal skill combinations for complex tasks
  • openjudge: build custom LLM evaluation pipelines with OpenJudge
  • paper-review: review academic papers using OpenJudge multi-stage pipeline
  • ref-hallucination-arena: benchmark LLM reference hallucination rates

Made-with: Cursor

Add the following skills to the skills directory:
- auto-arena: automatically evaluate and compare multiple AI models
- bib-verify: verify BibTeX references against CrossRef/arXiv/DBLP
- claude-authenticity: detect genuine Claude API endpoints
- find-skills-combo: discover optimal skill combinations for complex tasks
- openjudge: build custom LLM evaluation pipelines with OpenJudge
- paper-review: review academic papers using OpenJudge multi-stage pipeline
- ref-hallucination-arena: benchmark LLM reference hallucination rates

Made-with: Cursor
@DavdGao DavdGao requested a review from Copilot March 26, 2026 16:28
@DavdGao
Copy link
Member

DavdGao commented Mar 26, 2026

@qbc2016 Plz take a look at this PR

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds seven new skills (and supporting reference docs) to the skills/ directory, primarily focused on OpenJudge-based evaluation pipelines, academic paper review / BibTeX verification, model arena benchmarking, and Claude endpoint authenticity checking.

Changes:

  • Added new skills: auto-arena, bib-verify, claude-authenticity, find-skills-combo, openjudge, paper-review, ref-hallucination-arena.
  • Added OpenJudge reference sub-docs (graders/pipeline/generator/analyzer) and paper-review reference documentation.
  • Documented CLI/Python usage patterns and configuration templates for these workflows.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
skills/auto-arena/SKILL.md Documentation for an OpenJudge “auto arena” benchmarking pipeline.
skills/bib-verify/SKILL.md Documentation for BibTeX verification (via paper-review pipeline in bib-only mode).
skills/claude-authenticity/SKILL.md Documentation + self-contained script for Claude endpoint authenticity checks.
skills/find-skills-combo/SKILL.md Documentation for a multi-skill discovery/recommendation workflow using npx skills.
skills/openjudge/SKILL.md Top-level OpenJudge skill doc with pointers to sub-documents.
skills/openjudge/graders.md Reference documentation for OpenJudge graders and usage patterns.
skills/openjudge/pipeline.md Reference documentation for OpenJudge batch evaluation runner/aggregators/strategies.
skills/openjudge/generator.md Reference documentation for OpenJudge rubric generators.
skills/openjudge/analyzer.md Reference documentation for OpenJudge analyzers (pairwise/statistical/validation).
skills/paper-review/SKILL.md Documentation for multi-stage paper review pipeline (PDF/TeX/BibTeX).
skills/paper-review/reference.md Reference documentation for PaperReviewPipeline config + CLI flags.
skills/ref-hallucination-arena/SKILL.md Documentation for reference hallucination benchmarking pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +62
All file types use a single entry point. File type is auto-detected.

```bash
python -m cookbooks.paper_review [--input FILE] [options]
```

| Flag | Default | Description |
|------|---------|-------------|
| `--input` | — | Path to PDF, .tar.gz/.zip, or .bib file |
| `--bib_only` | — | Path to .bib file for standalone BibTeX-only verification |
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference describes usage via --input FILE, while the main SKILL.md examples use a positional paper.pdf argument. Please document a single canonical invocation (or explicitly state that both positional input and --input are supported) to prevent copy/paste CLI errors.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +27
## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skill explicitly says the checks “mirror claude-verify”. If any of the script logic/text is copied or closely adapted from that repository, please ensure the original project’s license is compatible with this repo and add the required attribution/license notice (e.g., in the doc or alongside the extracted script) to avoid downstream distribution issues.

Suggested change
## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))
## The 9 checks
> Attribution: The design and weighting of these nine checks are adapted from the open-source
> [claude-verify](https://github.com/molloryn/claude-verify) project, which is licensed under
> the MIT License by its original author(s). This skill reimplements that logic; any reused or
> adapted portions are used under the terms of the MIT License. See the claude-verify repository
> for the original implementation and full license text.

Copilot uses AI. Check for mistakes.

# Extra dependency for paper_review
pip install litellm
pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prerequisites say pypdfium2 is only needed if using vision mode, but later this doc says vision mode is enabled by default for PDF review. This is contradictory and can lead to missing-dependency failures. Clarify whether vision is default-on, and update the pypdfium2 note accordingly (required for default PDF review vs only when explicitly enabling vision).

Suggested change
pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True)
pip install pypdfium2 # required for default PDF review (vision mode is enabled by default)

Copilot uses AI. Check for mistakes.

## Additional resources

- Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “Full config examples” link points to ../../cookbooks/..., but this repository doesn’t contain a cookbooks/ directory, so the relative link will be broken on GitHub. Consider linking to the upstream OpenJudge repository (or include the referenced examples in this repo) so readers can actually access them.

Suggested change
- Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/)
- Full config examples: [cookbooks/auto_arena/examples/](https://github.com/agentscope-ai/OpenJudge/tree/main/cookbooks/auto_arena/examples)

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +258
- Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/)
- Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both “Full config examples” and “Documentation” are relative links into ../../cookbooks/... and ../../docs/..., but those directories aren’t present in this repo, so the links will be broken on GitHub. Please either include the referenced content here or replace these with stable external URLs (e.g., the upstream OpenJudge repo/docs).

Suggested change
- Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/)
- Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md)
- Full config examples: [cookbooks/ref_hallucination_arena/examples/](https://github.com/OpenJudge/open-judge/tree/main/cookbooks/ref_hallucination_arena/examples)
- Documentation: [docs/validating_graders/ref_hallucination_arena.md](https://github.com/OpenJudge/open-judge/blob/main/docs/validating_graders/ref_hallucination_arena.md)

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +255
​```bash
npx skills add owner/repo@skill-a -g -y
npx skills add owner/repo@skill-b -g -y
npx skills add owner/repo@skill-c -g -y
​```
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fenced code blocks under the Strategy A/Strategy B install sections contain a leading zero-width character before the triple backticks, which will prevent Markdown from rendering them as code fences in many viewers. Remove the hidden character so the fences start with plain ```.

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +272
### Install (M skills)
​```bash
npx skills add owner/repo@skill-a -g -y
​```
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: this install snippet has a zero-width character before the opening/closing triple backticks, which can break Markdown formatting. Ensure the code fence markers are exactly bash and with no hidden characters.

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +159
**Vision mode is enabled by default for PDF review.** Pages are rendered as images, which
preserves formatting, figures, and tables. To disable, pass `--no_vision` (not recommended).
The model **must** support multimodal (vision) input.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says to disable vision via --no_vision, but the options table doesn’t list a --no_vision flag and the reference.md indicates disabling via --vision=False. Please align on the actual CLI interface (either document --vision/--no-vision style, or the --vision=False convention) and make the disable instructions consistent across both files.

Copilot uses AI. Check for mistakes.
| `venue` | str \| None | `None` | Target venue name, applied on top of discipline criteria |
| `instructions` | str \| None | `None` | Free-form reviewer guidance, e.g. "Focus on experimental design" |
| `language` | str \| None | `None` | Output language: `"en"` (default) or `"zh"` (Simplified Chinese) |
| `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) |
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_vision_for_pdf is documented as defaulting to False, but later in this reference the CLI flag --vision is shown with default True, and the main SKILL.md says vision is enabled by default for PDF review. These defaults should be consistent to avoid users configuring the pipeline incorrectly.

Suggested change
| `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) |
| `use_vision_for_pdf` | bool | `True` | Render PDF pages as images (needs `pypdfium2`) |

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +47
| Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` |
| DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` |
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model string examples here don’t match the model-format guidance in skills/paper-review/SKILL.md (which requires anthropic/ and dashscope/ prefixes for routing). Either the pipeline accepts both formats or one of these docs is wrong; please reconcile the expected LiteLLM model string format across both documents so users pass valid --model values.

Suggested change
| Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` |
| DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` |
| Anthropic | `anthropic/claude-opus-4-5`, `anthropic/claude-sonnet-4-5`, `anthropic/claude-haiku-3-5` | `ANTHROPIC_API_KEY` |
| DashScope / Qwen | `dashscope/qwen-plus`, `dashscope/qwen-max`, `dashscope/qwen-turbo` | `DASHSCOPE_API_KEY` |

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants