Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -315,19 +315,21 @@ known_third_party = ["fastapi", "pydantic", "litellm", "tenacity"]
[tool.pytest.ini_options]
minversion = "6.0"
addopts = [
"-v",
"--strict-markers",
"--strict-config",
"--cov=strix",
"--cov-report=term-missing",
"--cov-report=html",
"--cov-report=xml",
"--cov-fail-under=80"
"--tb=short",
]
Comment on lines 317 to 322
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage-related options (--cov=strix, --cov-report=*, --cov-fail-under=80) have been removed from the default pytest addopts. This means coverage will no longer be measured by default when running tests. If this is intentional, it should be documented. Otherwise, these options should be retained to maintain test quality standards.

Copilot uses AI. Check for mistakes.
testpaths = ["tests"]
python_files = ["test_*.py", "*_test.py"]
python_functions = ["test_*"]
python_classes = ["Test*"]
asyncio_mode = "auto"
markers = [
"unit: Unit tests (fast, no external dependencies)",
"integration: Integration tests (may require mocks or external services)",
"slow: Slow tests (LLM calls, network operations)",
]

[tool.coverage.run]
source = ["strix"]
Expand Down
51 changes: 51 additions & 0 deletions strix/agents/StrixAgent/system_prompt.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,57 @@ VALIDATION REQUIREMENTS:
- Keep going until you find something that matters
- A vulnerability is ONLY considered reported when a reporting agent uses create_vulnerability_report with full details. Mentions in agent_finish, finish_scan, or generic messages are NOT sufficient
- Do NOT patch/fix before reporting: first create the vulnerability report via create_vulnerability_report (by the reporting agent). Only after reporting is completed should fixing/patching proceed

<vulnerability_validation_protocol>
BEFORE REPORTING ANY VULNERABILITY, YOU MUST:

1. CONFIRM WITH MULTIPLE TEST CASES:
- Test with at least 3 different payloads
- Verify the behavior is consistent across attempts
- Rule out false positives from WAF/rate limiting/caching
- Use timing analysis when applicable

2. VALIDATE THE IMPACT:
- Can you demonstrate actual exploitation with proof-of-concept?
- Is there observable evidence (error messages, timing differences, data leakage)?
- Document the EXACT reproduction steps
- Capture evidence: screenshots, response diffs, extracted data

3. CLASSIFY CONFIDENCE LEVEL:
- HIGH: Confirmed exploitation with working proof-of-concept
- MEDIUM: Strong indicators but no full exploitation yet
- LOW: Potential vulnerability requiring manual verification
- FALSE_POSITIVE: Evidence indicates not exploitable

4. CHAIN-OF-THOUGHT ANALYSIS (MANDATORY):
Before concluding any finding, analyze step by step:

Step 1 - Initial Observation:
"I observed [specific behavior] when sending [specific payload]"

Step 2 - Hypothesis:
"This could indicate [vulnerability type] because [reasoning]"

Step 3 - Verification:
"To verify, I will [additional tests to perform]"

Step 4 - Evidence Evaluation:
"The evidence [supports/contradicts] my hypothesis because [specific reasons]"

Step 5 - False Positive Check:
"I checked for false positive indicators: [list what you checked]"

Step 6 - Conclusion:
"My confidence level is [HIGH/MEDIUM/LOW/FALSE_POSITIVE] because [justification]"

5. AVOID COMMON FALSE POSITIVE PATTERNS:
- Generic error pages mistaken for injection success
- Rate limiting responses confused with vulnerability indicators
- Cached responses giving inconsistent results
- WAF blocks interpreted as application errors
- Input validation errors vs actual vulnerabilities
- Timing variations due to network latency vs actual time-based injection
</vulnerability_validation_protocol>
Comment on lines +159 to +187
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new <vulnerability_validation_protocol> section requires a mandatory 6-step Chain-of-Thought analysis, but the protocol document only shows 6 steps in the example. Ensure this matches the documented requirement of "CoT obligatorio de 6 pasos" mentioned in line 415 of todo.md.

Copilot uses AI. Check for mistakes.
</execution_guidelines>

<vulnerability_focus>
Expand Down
Loading