Skip to content

anicka-net/opensuse-security-scanner

Repository files navigation

openSUSE Security Scanner

Multi-model vulnerability scanning pipeline for software packages. Supports C/C++, Python, Ruby, Bash, Rust, Perl, and Node.js via language-specific profiles. Three independent stages — triage, reasoning, verdict — each backed by any LLM you choose: local models via ollama/llama.cpp, or frontier APIs via Claude, Gemini, or Codex CLIs.

This tool is designed for analyst assistance, not autonomous security sign-off. A "clean" report means "nothing survived this pipeline under this model mix," not "this package is vulnerability-free."

This scanner is directly inspired by AISLE's "AI Cybersecurity After Mythos: The Jagged Frontier" methodology: treat AI cybersecurity as a modular system, not a single magic model; use cheap, broad scanning where possible; use a different deeper reasoning stage for validation; and keep the scaffold and artifacts inspectable. In that sense, the design here follows the same core lesson: the moat is the system, not the model.

How it works

Source code (OBS or local)
        |
        v
  +-----------+     Paranoid pattern matching.
  |  TRIAGE   |     Flags anything suspicious.
  |  (fast)   |     ~70% of files come back clean.
  +-----------+
        |  flagged files only
        v
  +-----------+     Independent chain analysis.
  | REASONING |     Traces data flow source -> sink.
  |  (deep)   |     Rules out common FP patterns.
  +-----------+
        |  findings from both stages
        v
  +-----------+     Sees what both stages found.
  |  VERDICT  |     Checks privilege boundaries,
  | (precise) |     attack surface, exploitability.
  +-----------+
        |
        v
  Markdown + JSON report

Triage and reasoning scan independently — neither sees the other's output. This means cross-stage consensus is genuine signal: if both flag the same function, it's likely real. The verdict stage sees everything and makes the final call.

Quick start

# Scan an OBS package with automatic per-file language dispatch
python3 scan.py --obs-package openSUSE:Factory/zypper

# Scan a local source tree
python3 scan.py --source-dir /path/to/extracted-source

# Triage only (single stage, fast)
python3 scan.py --source-dir ./src --triage-only

# Force only specific language families instead of auto-detecting by extension
python3 scan.py --source-dir ./src --profile c_cpp,python,bash

Configuration

For complex or repeated scans, you can use a config.toml file to set your default options. The scanner will automatically load config.toml from the current directory if it exists. You can also specify a path to a different config file with python3 scan.py --config /path/to/your/config.toml.

An example is provided in config.toml.example.

Command-line arguments always override settings from the configuration file.

For regular use, prefer local models for triage and reasoning and reserve frontier models for verdict or high-value packages. A full frontier run on large packages can be slow and expensive.

Recommended setup

Based on testing against openSUSE packages with a mix of real and false-positive findings:

Stage Model Why VRAM
Triage GPT-OSS 20B Good signal-to-noise in current repo testing. 13 GB
Reasoning Gemma 4 31B Good chain-analysis behavior in current repo testing. 33 GB
Verdict Claude / Gemini / Codex Stronger privilege-boundary and exploitability review. API
# Recommended: local triage + reasoning, Claude verdict
python3 scan.py \
  --obs-package openSUSE:Factory/zypper \
  --triage ollama/gpt-oss-20b \
  --reasoning ollama/gemma4:31b \
  --verdict claude/opus

# Same but with llama.cpp servers instead of ollama
python3 scan.py \
  --obs-package openSUSE:Factory/zypper \
  --triage openai/gpt-oss-20b@http://localhost:8404 \
  --reasoning openai/gemma-4-31b@http://localhost:8405 \
  --verdict claude/opus

Backend reference

Every stage accepts a backend spec in backend/model[@url] format:

Backend Format Auth Notes
ollama ollama/model-name None Default port 11434. Custom: ollama/model@http://host:port
openai openai/model@http://host:port Optional OPENAI_API_KEY Works with llama.cpp, vLLM, any OpenAI-compatible server
nim nim/vendor/model NVIDIA_API_KEY NVIDIA NIM cloud API. No GPU needed
claude claude/opus, claude/sonnet, claude/haiku CLI subscription Uses claude CLI, no API key needed
gemini gemini/flash, gemini/pro CLI subscription Uses gemini CLI, no API key needed
codex codex/default CLI subscription Uses codex CLI, no API key needed

Examples

# Package-wide auto dispatch across all known profiles
--profile auto

# Limit scanning to selected language families
--profile c_cpp,python,bash

# All local (two GPUs)
--triage ollama/gpt-oss-20b --reasoning ollama/gemma4:31b

# NVIDIA NIM — three-tier split, no GPU needed
--triage nim/nvidia/nemotron-3-nano-30b-a3b \
--reasoning nim/nvidia/llama-3.3-nemotron-super-49b-v1.5 \
--verdict nim/mistralai/mistral-large-3-675b-instruct-2512

# Mixed local + API
--triage ollama/gpt-oss-20b --reasoning gemini/flash --verdict claude/opus

# All frontier (burns tokens, but works without GPUs)
--triage gemini/flash --reasoning claude/sonnet --verdict claude/opus

# Big GPU setup with Kimi K2 via ollama
--triage ollama/gpt-oss-20b --reasoning ollama/kimi-k2 --verdict ollama/kimi-k2

What each stage does

Profiles

--profile auto is the default and is the recommended mode for whole-package scans. It activates all known language profiles and dispatches each file to the matching prompt family by extension.

Currently included profiles:

Profile Extensions
c_cpp .c, .h, .cpp, .cc, .cxx, .hpp, .hxx
python .py
bash .sh, .bash
ruby .rb, .rake, .gemspec
perl .pl, .pm, .t
rust .rs
node .js, .mjs, .cjs, .ts

For mixed packages in openSUSE, auto is usually the right choice.

Triage

Paranoid pattern matcher. Told to "assume the worst" and flag anything that could be a vulnerability. Casts a wide net. Typically flags 20-40% of files. False positive rate depends on model — GPT-OSS 20B is the best we tested (zero FP on known-clean files).

Reasoning

Independent chain analyst. Does NOT see triage results. Uses a different prompt focused on tracing data flow from untrusted source to dangerous sink. Must describe the complete vulnerability chain. Explicitly checks for common false positive patterns before reporting (abort-on-failure allocators, exit() error handlers, literal format strings, integer promotion safety, root-only contexts).

Verdict

Final reviewer. Sees findings from BOTH previous stages, grouped by stage with a consensus note ("both stages flagged this" vs "only triage flagged this — examine carefully"). Checks privilege boundaries, D-Bus policy, attack surface. Filters out false positives. Only confirmed findings appear in the final report.

Session persistence

Every run creates a session directory under --scratch-dir (default: /tmp/opensuse-security-scanner/) containing:

fillup-<uuid>/
  metadata.json          # backends, timestamps, package info
  progress.jsonl         # one line per scanned file (tail -f friendly)
  triage/
    SRC/parser.c.json    # raw output + parsed findings per file
    SRC/services.c.json
  reasoning/
    SRC/parser.c.json    # independent deep analysis
    SRC/services.c.json
  verdict/
    SRC/services.c.<hash>.json   # per-finding verdict with reasoning

All raw model outputs are preserved for debugging, comparison, and reproducibility.

The scanner can also resume a persisted session:

python3 scan.py \
  --resume-session /tmp/opensuse-security-scanner/permissions-<uuid> \
  --triage openai/gpt-oss-20b@http://localhost:8404 \
  --reasoning openai/gemma-4-31b@http://localhost:8405

When resuming, already-written stage records are reused and only missing files are scanned.

Example session

Real session layout from a local-model run on openSUSE:Factory/permissions:

/tmp/opensuse-security-scanner/permissions-a0974a27-85a1-43f0-83a5-415f8215dd65/
  metadata.json
  progress.jsonl
  triage/src/varexp.cpp.json
  reasoning/src/varexp.cpp.json

Example progress funnel from that run:

triage: 15 completed, 1 file flagged, 1 total finding
reasoning: 1 completed, 0 surviving findings
verdict: disabled

The per-file JSON artifacts preserve both the raw model output and the parsed finding structure, so you can inspect why a file was flagged or cleared without re-running the scan.

Model comparison results

These notes are early directional observations from local testing, not a benchmark suite and not a publishable claim set. Treat them as operator guidance only.

Tested on openSUSE packages containing both subtle and obvious vulnerability patterns. Same prompt, same files across all models:

Model Subtle chain bugs Obvious pattern bugs FP noise
GPT-OSS 20B (3.6B active) Found Found Low
Gemma 4 31B Strong — traced full chain Found + novel extras Medium
GPT-OSS 120B (5.1B active) Found Found Medium
Qwen3 32B Partial — missed root cause Found Medium
Devstral Small 2 24B Partial — wrong root cause Found Very high

Requirements

  • Python 3.8+
  • pip install -r requirements.txt
  • osc (for --obs-package mode)
  • At least one backend: ollama, a llama.cpp server, or a CLI (claude/gemini/codex)

License

Apache-2.0

About

Multi-model vulnerability scanning pipeline for software packages (C/C++, Python, Ruby, Bash, Rust, Perl, Node.js)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages