Skip to content

Add artemiskit-cli skill#668

Open
code-sensei wants to merge 1 commit intovercel-labs:mainfrom
code-sensei:add-artemiskit-cli-skill
Open

Add artemiskit-cli skill#668
code-sensei wants to merge 1 commit intovercel-labs:mainfrom
code-sensei:add-artemiskit-cli-skill

Conversation

@code-sensei
Copy link

Summary

Adds the artemiskit-cli skill for LLM evaluation and security testing.

ArtemisKit is an open-source CLI toolkit that helps developers:

  • Test LLM outputs with scenario-based evaluation (YAML-driven quality testing)
  • Secure LLMs via red teaming (prompt injection, jailbreaks, data extraction, PII disclosure)
  • Stress test LLM endpoints (latency p50/p95/p99, throughput, token usage, cost estimation)
  • Compare evaluation runs for regression detection
  • Generate interactive HTML reports and JSON manifests

Commands

Command Purpose
akit run Execute scenario-based evaluations
akit redteam Security red team testing
akit stress Load and stress testing
akit report Generate/regenerate reports
akit history View run history
akit compare Compare two evaluation runs
akit baseline Manage baselines for regression testing
akit validate Validate scenario files
akit init Initialize configuration

Provider Support

  • OpenAI (GPT-4, GPT-4o, etc.)
  • Anthropic (Claude 3.5, Claude 4, etc.)
  • Azure OpenAI
  • Vercel AI SDK
  • OpenAI-compatible APIs (Ollama, vLLM, LM Studio)

Skill Structure

skills/artemiskit-cli/
├── SKILL.md              # Main skill file
└── references/
    ├── commands.md       # CLI command reference
    ├── providers.md      # Provider configuration
    └── scenarios.md      # Scenario format documentation

Links

Test Plan

  • Skill follows SKILL.md format with YAML frontmatter
  • References use progressive disclosure pattern
  • All commands verified against CLI implementation (--help output)
  • Installed and tested via npx skills install code-sensei/artemiskit-cli-skill

ArtemisKit is an open-source LLM evaluation toolkit that provides:
- Quality testing with scenario-based evaluation (YAML-driven)
- Security red teaming for prompt injection, jailbreaks, data extraction
- Stress testing with latency metrics (p50/p95/p99), throughput, costs
- Multi-provider support (OpenAI, Anthropic, Azure, Vercel AI SDK)

Commands: run, redteam, stress, report, history, compare, baseline, validate, init

Repository: https://github.com/code-sensei/artemiskit
npm: @artemiskit/cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant