Multi-model vulnerability scanning pipeline for software packages. Supports C/C++, Python, Ruby, Bash, Rust, Perl, and Node.js via language-specific profiles. Three independent stages — triage, reasoning, verdict — each backed by any LLM you choose: local models via ollama/llama.cpp, or frontier APIs via Claude, Gemini, or Codex CLIs.
This tool is designed for analyst assistance, not autonomous security sign-off. A "clean" report means "nothing survived this pipeline under this model mix," not "this package is vulnerability-free."
This scanner is directly inspired by AISLE's "AI Cybersecurity After Mythos: The Jagged Frontier" methodology: treat AI cybersecurity as a modular system, not a single magic model; use cheap, broad scanning where possible; use a different deeper reasoning stage for validation; and keep the scaffold and artifacts inspectable. In that sense, the design here follows the same core lesson: the moat is the system, not the model.
Source code (OBS or local)
|
v
+-----------+ Paranoid pattern matching.
| TRIAGE | Flags anything suspicious.
| (fast) | ~70% of files come back clean.
+-----------+
| flagged files only
v
+-----------+ Independent chain analysis.
| REASONING | Traces data flow source -> sink.
| (deep) | Rules out common FP patterns.
+-----------+
| findings from both stages
v
+-----------+ Sees what both stages found.
| VERDICT | Checks privilege boundaries,
| (precise) | attack surface, exploitability.
+-----------+
|
v
Markdown + JSON report
Triage and reasoning scan independently — neither sees the other's output. This means cross-stage consensus is genuine signal: if both flag the same function, it's likely real. The verdict stage sees everything and makes the final call.
# Scan an OBS package with automatic per-file language dispatch
python3 scan.py --obs-package openSUSE:Factory/zypper
# Scan a local source tree
python3 scan.py --source-dir /path/to/extracted-source
# Triage only (single stage, fast)
python3 scan.py --source-dir ./src --triage-only
# Force only specific language families instead of auto-detecting by extension
python3 scan.py --source-dir ./src --profile c_cpp,python,bashFor complex or repeated scans, you can use a config.toml file to set your
default options. The scanner will automatically load config.toml from the
current directory if it exists. You can also specify a path to a different
config file with python3 scan.py --config /path/to/your/config.toml.
An example is provided in config.toml.example.
Command-line arguments always override settings from the configuration file.
For regular use, prefer local models for triage and reasoning and reserve frontier models for verdict or high-value packages. A full frontier run on large packages can be slow and expensive.
Based on testing against openSUSE packages with a mix of real and false-positive findings:
| Stage | Model | Why | VRAM |
|---|---|---|---|
| Triage | GPT-OSS 20B | Good signal-to-noise in current repo testing. | 13 GB |
| Reasoning | Gemma 4 31B | Good chain-analysis behavior in current repo testing. | 33 GB |
| Verdict | Claude / Gemini / Codex | Stronger privilege-boundary and exploitability review. | API |
# Recommended: local triage + reasoning, Claude verdict
python3 scan.py \
--obs-package openSUSE:Factory/zypper \
--triage ollama/gpt-oss-20b \
--reasoning ollama/gemma4:31b \
--verdict claude/opus
# Same but with llama.cpp servers instead of ollama
python3 scan.py \
--obs-package openSUSE:Factory/zypper \
--triage openai/gpt-oss-20b@http://localhost:8404 \
--reasoning openai/gemma-4-31b@http://localhost:8405 \
--verdict claude/opusEvery stage accepts a backend spec in backend/model[@url] format:
| Backend | Format | Auth | Notes |
|---|---|---|---|
| ollama | ollama/model-name |
None | Default port 11434. Custom: ollama/model@http://host:port |
| openai | openai/model@http://host:port |
Optional OPENAI_API_KEY |
Works with llama.cpp, vLLM, any OpenAI-compatible server |
| nim | nim/vendor/model |
NVIDIA_API_KEY |
NVIDIA NIM cloud API. No GPU needed |
| claude | claude/opus, claude/sonnet, claude/haiku |
CLI subscription | Uses claude CLI, no API key needed |
| gemini | gemini/flash, gemini/pro |
CLI subscription | Uses gemini CLI, no API key needed |
| codex | codex/default |
CLI subscription | Uses codex CLI, no API key needed |
# Package-wide auto dispatch across all known profiles
--profile auto
# Limit scanning to selected language families
--profile c_cpp,python,bash
# All local (two GPUs)
--triage ollama/gpt-oss-20b --reasoning ollama/gemma4:31b
# NVIDIA NIM — three-tier split, no GPU needed
--triage nim/nvidia/nemotron-3-nano-30b-a3b \
--reasoning nim/nvidia/llama-3.3-nemotron-super-49b-v1.5 \
--verdict nim/mistralai/mistral-large-3-675b-instruct-2512
# Mixed local + API
--triage ollama/gpt-oss-20b --reasoning gemini/flash --verdict claude/opus
# All frontier (burns tokens, but works without GPUs)
--triage gemini/flash --reasoning claude/sonnet --verdict claude/opus
# Big GPU setup with Kimi K2 via ollama
--triage ollama/gpt-oss-20b --reasoning ollama/kimi-k2 --verdict ollama/kimi-k2--profile auto is the default and is the recommended mode for whole-package scans.
It activates all known language profiles and dispatches each file to the matching
prompt family by extension.
Currently included profiles:
| Profile | Extensions |
|---|---|
c_cpp |
.c, .h, .cpp, .cc, .cxx, .hpp, .hxx |
python |
.py |
bash |
.sh, .bash |
ruby |
.rb, .rake, .gemspec |
perl |
.pl, .pm, .t |
rust |
.rs |
node |
.js, .mjs, .cjs, .ts |
For mixed packages in openSUSE, auto is usually the right choice.
Paranoid pattern matcher. Told to "assume the worst" and flag anything that could be a vulnerability. Casts a wide net. Typically flags 20-40% of files. False positive rate depends on model — GPT-OSS 20B is the best we tested (zero FP on known-clean files).
Independent chain analyst. Does NOT see triage results. Uses a different prompt focused on tracing data flow from untrusted source to dangerous sink. Must describe the complete vulnerability chain. Explicitly checks for common false positive patterns before reporting (abort-on-failure allocators, exit() error handlers, literal format strings, integer promotion safety, root-only contexts).
Final reviewer. Sees findings from BOTH previous stages, grouped by stage with a consensus note ("both stages flagged this" vs "only triage flagged this — examine carefully"). Checks privilege boundaries, D-Bus policy, attack surface. Filters out false positives. Only confirmed findings appear in the final report.
Every run creates a session directory under --scratch-dir
(default: /tmp/opensuse-security-scanner/) containing:
fillup-<uuid>/
metadata.json # backends, timestamps, package info
progress.jsonl # one line per scanned file (tail -f friendly)
triage/
SRC/parser.c.json # raw output + parsed findings per file
SRC/services.c.json
reasoning/
SRC/parser.c.json # independent deep analysis
SRC/services.c.json
verdict/
SRC/services.c.<hash>.json # per-finding verdict with reasoning
All raw model outputs are preserved for debugging, comparison, and reproducibility.
The scanner can also resume a persisted session:
python3 scan.py \
--resume-session /tmp/opensuse-security-scanner/permissions-<uuid> \
--triage openai/gpt-oss-20b@http://localhost:8404 \
--reasoning openai/gemma-4-31b@http://localhost:8405When resuming, already-written stage records are reused and only missing files are scanned.
Real session layout from a local-model run on openSUSE:Factory/permissions:
/tmp/opensuse-security-scanner/permissions-a0974a27-85a1-43f0-83a5-415f8215dd65/
metadata.json
progress.jsonl
triage/src/varexp.cpp.json
reasoning/src/varexp.cpp.json
Example progress funnel from that run:
triage: 15 completed, 1 file flagged, 1 total finding
reasoning: 1 completed, 0 surviving findings
verdict: disabled
The per-file JSON artifacts preserve both the raw model output and the parsed finding structure, so you can inspect why a file was flagged or cleared without re-running the scan.
These notes are early directional observations from local testing, not a benchmark suite and not a publishable claim set. Treat them as operator guidance only.
Tested on openSUSE packages containing both subtle and obvious vulnerability patterns. Same prompt, same files across all models:
| Model | Subtle chain bugs | Obvious pattern bugs | FP noise |
|---|---|---|---|
| GPT-OSS 20B (3.6B active) | Found | Found | Low |
| Gemma 4 31B | Strong — traced full chain | Found + novel extras | Medium |
| GPT-OSS 120B (5.1B active) | Found | Found | Medium |
| Qwen3 32B | Partial — missed root cause | Found | Medium |
| Devstral Small 2 24B | Partial — wrong root cause | Found | Very high |
- Python 3.8+
pip install -r requirements.txtosc(for--obs-packagemode)- At least one backend: ollama, a llama.cpp server, or a CLI (claude/gemini/codex)
Apache-2.0