feat: add 7 new agent skills by XiaoBoAI · Pull Request #2 · agentscope-ai/skills

XiaoBoAI · 2026-03-10T12:30:58Z

Add the following skills to the skills directory:

auto-arena: automatically evaluate and compare multiple AI models
bib-verify: verify BibTeX references against CrossRef/arXiv/DBLP
claude-authenticity: detect genuine Claude API endpoints
find-skills-combo: discover optimal skill combinations for complex tasks
openjudge: build custom LLM evaluation pipelines with OpenJudge
paper-review: review academic papers using OpenJudge multi-stage pipeline
ref-hallucination-arena: benchmark LLM reference hallucination rates

Made-with: Cursor

Add the following skills to the skills directory: - auto-arena: automatically evaluate and compare multiple AI models - bib-verify: verify BibTeX references against CrossRef/arXiv/DBLP - claude-authenticity: detect genuine Claude API endpoints - find-skills-combo: discover optimal skill combinations for complex tasks - openjudge: build custom LLM evaluation pipelines with OpenJudge - paper-review: review academic papers using OpenJudge multi-stage pipeline - ref-hallucination-arena: benchmark LLM reference hallucination rates Made-with: Cursor

DavdGao · 2026-03-26T16:28:48Z

@qbc2016 Plz take a look at this PR

Copilot

Pull request overview

This PR adds seven new skills (and supporting reference docs) to the skills/ directory, primarily focused on OpenJudge-based evaluation pipelines, academic paper review / BibTeX verification, model arena benchmarking, and Claude endpoint authenticity checking.

Changes:

Added new skills: auto-arena, bib-verify, claude-authenticity, find-skills-combo, openjudge, paper-review, ref-hallucination-arena.
Added OpenJudge reference sub-docs (graders/pipeline/generator/analyzer) and paper-review reference documentation.
Documented CLI/Python usage patterns and configuration templates for these workflows.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
skills/auto-arena/SKILL.md	Documentation for an OpenJudge “auto arena” benchmarking pipeline.
skills/bib-verify/SKILL.md	Documentation for BibTeX verification (via paper-review pipeline in bib-only mode).
skills/claude-authenticity/SKILL.md	Documentation + self-contained script for Claude endpoint authenticity checks.
skills/find-skills-combo/SKILL.md	Documentation for a multi-skill discovery/recommendation workflow using `npx skills`.
skills/openjudge/SKILL.md	Top-level OpenJudge skill doc with pointers to sub-documents.
skills/openjudge/graders.md	Reference documentation for OpenJudge graders and usage patterns.
skills/openjudge/pipeline.md	Reference documentation for OpenJudge batch evaluation runner/aggregators/strategies.
skills/openjudge/generator.md	Reference documentation for OpenJudge rubric generators.
skills/openjudge/analyzer.md	Reference documentation for OpenJudge analyzers (pairwise/statistical/validation).
skills/paper-review/SKILL.md	Documentation for multi-stage paper review pipeline (PDF/TeX/BibTeX).
skills/paper-review/reference.md	Reference documentation for `PaperReviewPipeline` config + CLI flags.
skills/ref-hallucination-arena/SKILL.md	Documentation for reference hallucination benchmarking pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T16:34:18Z

skills/paper-review/reference.md

+All file types use a single entry point. File type is auto-detected.
+
+```bash
+python -m cookbooks.paper_review [--input FILE] [options]
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--input` | — | Path to PDF, .tar.gz/.zip, or .bib file |
+| `--bib_only` | — | Path to .bib file for standalone BibTeX-only verification |


This reference describes usage via --input FILE, while the main SKILL.md examples use a positional paper.pdf argument. Please document a single canonical invocation (or explicitly state that both positional input and --input are supported) to prevent copy/paste CLI errors.

Copilot · 2026-03-26T16:34:18Z

skills/claude-authenticity/SKILL.md

+## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))
+


This skill explicitly says the checks “mirror claude-verify”. If any of the script logic/text is copied or closely adapted from that repository, please ensure the original project’s license is compatible with this repo and add the required attribution/license notice (e.g., in the doc or alongside the extracted script) to avoid downstream distribution issues.

Suggested change

## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))

## The 9 checks

> Attribution: The design and weighting of these nine checks are adapted from the open-source

> [claude-verify](https://github.com/molloryn/claude-verify) project, which is licensed under

> the MIT License by its original author(s). This skill reimplements that logic; any reused or

> adapted portions are used under the terms of the MIT License. See the claude-verify repository

> for the original implementation and full license text.

Copilot · 2026-03-26T16:34:19Z

skills/paper-review/SKILL.md

+
+# Extra dependency for paper_review
+pip install litellm
+pip install pypdfium2  # only if using vision mode (use_vision_for_pdf=True)


The prerequisites say pypdfium2 is only needed if using vision mode, but later this doc says vision mode is enabled by default for PDF review. This is contradictory and can lead to missing-dependency failures. Clarify whether vision is default-on, and update the pypdfium2 note accordingly (required for default PDF review vs only when explicitly enabling vision).

Suggested change

pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True)

pip install pypdfium2 # required for default PDF review (vision mode is enabled by default)

Copilot · 2026-03-26T16:34:19Z

skills/auto-arena/SKILL.md

+
+## Additional resources
+
+- Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/)


This “Full config examples” link points to ../../cookbooks/..., but this repository doesn’t contain a cookbooks/ directory, so the relative link will be broken on GitHub. Consider linking to the upstream OpenJudge repository (or include the referenced examples in this repo) so readers can actually access them.

Suggested change

- Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/)

- Full config examples: [cookbooks/auto_arena/examples/](https://github.com/agentscope-ai/OpenJudge/tree/main/cookbooks/auto_arena/examples)

Copilot · 2026-03-26T16:34:19Z

skills/ref-hallucination-arena/SKILL.md

+- Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/)
+- Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md)


Both “Full config examples” and “Documentation” are relative links into ../../cookbooks/... and ../../docs/..., but those directories aren’t present in this repo, so the links will be broken on GitHub. Please either include the referenced content here or replace these with stable external URLs (e.g., the upstream OpenJudge repo/docs).

Suggested change

- Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/)

- Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md)

- Full config examples: [cookbooks/ref_hallucination_arena/examples/](https://github.com/OpenJudge/open-judge/tree/main/cookbooks/ref_hallucination_arena/examples)

- Documentation: [docs/validating_graders/ref_hallucination_arena.md](https://github.com/OpenJudge/open-judge/blob/main/docs/validating_graders/ref_hallucination_arena.md)

Copilot · 2026-03-26T16:34:20Z

skills/find-skills-combo/SKILL.md

+```bash
+npx skills add owner/repo@skill-a -g -y
+npx skills add owner/repo@skill-b -g -y
+npx skills add owner/repo@skill-c -g -y
+```


The fenced code blocks under the Strategy A/Strategy B install sections contain a leading zero-width character before the triple backticks, which will prevent Markdown from rendering them as code fences in many viewers. Remove the hidden character so the fences start with plain ```.

Copilot · 2026-03-26T16:34:20Z

skills/find-skills-combo/SKILL.md

+### Install (M skills)
+```bash
+npx skills add owner/repo@skill-a -g -y
+```


Same as above: this install snippet has a zero-width character before the opening/closing triple backticks, which can break Markdown formatting. Ensure the code fence markers are exactly bash and with no hidden characters.

Copilot · 2026-03-26T16:34:20Z

skills/paper-review/SKILL.md

+**Vision mode is enabled by default for PDF review.** Pages are rendered as images, which
+preserves formatting, figures, and tables. To disable, pass `--no_vision` (not recommended).
+The model **must** support multimodal (vision) input.


This doc says to disable vision via --no_vision, but the options table doesn’t list a --no_vision flag and the reference.md indicates disabling via --vision=False. Please align on the actual CLI interface (either document --vision/--no-vision style, or the --vision=False convention) and make the disable instructions consistent across both files.

Copilot · 2026-03-26T16:34:21Z

skills/paper-review/reference.md

+| `venue` | str \| None | `None` | Target venue name, applied on top of discipline criteria |
+| `instructions` | str \| None | `None` | Free-form reviewer guidance, e.g. "Focus on experimental design" |
+| `language` | str \| None | `None` | Output language: `"en"` (default) or `"zh"` (Simplified Chinese) |
+| `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) |


use_vision_for_pdf is documented as defaulting to False, but later in this reference the CLI flag --vision is shown with default True, and the main SKILL.md says vision is enabled by default for PDF review. These defaults should be consistent to avoid users configuring the pipeline incorrectly.

Suggested change

| `use_vision_for_pdf` | bool | `False` | Render PDF pages as images (needs `pypdfium2`) |

| `use_vision_for_pdf` | bool | `True` | Render PDF pages as images (needs `pypdfium2`) |

Copilot · 2026-03-26T16:34:21Z

skills/paper-review/reference.md

+| Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` |
+| DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` |


The model string examples here don’t match the model-format guidance in skills/paper-review/SKILL.md (which requires anthropic/ and dashscope/ prefixes for routing). Either the pipeline accepts both formats or one of these docs is wrong; please reconcile the expected LiteLLM model string format across both documents so users pass valid --model values.

Suggested change

| Anthropic | `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` | `ANTHROPIC_API_KEY` |

| DashScope / Qwen | `qwen-plus`, `qwen-max`, `qwen-turbo` | `DASHSCOPE_API_KEY` |

| Anthropic | `anthropic/claude-opus-4-5`, `anthropic/claude-sonnet-4-5`, `anthropic/claude-haiku-3-5` | `ANTHROPIC_API_KEY` |

| DashScope / Qwen | `dashscope/qwen-plus`, `dashscope/qwen-max`, `dashscope/qwen-turbo` | `DASHSCOPE_API_KEY` |

DavdGao requested a review from Copilot March 26, 2026 16:28

DavdGao requested a review from qbc2016 March 26, 2026 16:28

Copilot started reviewing on behalf of DavdGao March 26, 2026 16:29 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add 7 new agent skills#2

feat: add 7 new agent skills#2
XiaoBoAI wants to merge 1 commit intomainfrom
feat/add-7-skills

XiaoBoAI commented Mar 10, 2026

Uh oh!

DavdGao commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))

-## The 9 checks (mirrors [claude-verify](https://github.com/molloryn/claude-verify))
+## The 9 checks
+> Attribution: The design and weighting of these nine checks are adapted from the open-source
+> [claude-verify](https://github.com/molloryn/claude-verify) project, which is licensed under
+> the MIT License by its original author(s). This skill reimplements that logic; any reused or
+> adapted portions are used under the terms of the MIT License. See the claude-verify repository
+> for the original implementation and full license text.

	pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True)
	pip install pypdfium2 # required for default PDF review (vision mode is enabled by default)


		## Additional resources

		- Full config examples: [cookbooks/auto_arena/examples/](../../cookbooks/auto_arena/examples/)

		- Full config examples: [cookbooks/ref_hallucination_arena/examples/](../../cookbooks/ref_hallucination_arena/examples/)
		- Documentation: [docs/validating_graders/ref_hallucination_arena.md](../../docs/validating_graders/ref_hallucination_arena.md)

	\| `use_vision_for_pdf` \| bool \| `False` \| Render PDF pages as images (needs `pypdfium2`) \|
	\| `use_vision_for_pdf` \| bool \| `True` \| Render PDF pages as images (needs `pypdfium2`) \|

		\| Anthropic \| `claude-opus-4-5`, `claude-sonnet-4-5`, `claude-haiku-3-5` \| `ANTHROPIC_API_KEY` \|
		\| DashScope / Qwen \| `qwen-plus`, `qwen-max`, `qwen-turbo` \| `DASHSCOPE_API_KEY` \|

Conversation

XiaoBoAI commented Mar 10, 2026

Uh oh!

DavdGao commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants