Merge pull request #150 from risk-first/practice-pages

robmoffat · web-flow · commit d34c2c26849a · 2025-03-13T09:51:15.000Z
Added AI As Judge
diff --git a/dictionary.txt b/dictionary.txt
@@ -404,3 +404,4 @@ incentivised
 stanislav
 petrov
 showcasing
+adversarial
diff --git a/docs/ai/Practices/AI-As-Judge.md b/docs/ai/Practices/AI-As-Judge.md
@@ -0,0 +1,34 @@
+---
+title: AI As Judge
+description: "Using the outputs of one (trained) AI to measure the performance of another"
+featured: 
+  class: c
+  element: '<action>AI-As-Judge</action>'
+tags: 
+  - AI As Judge
+  - AI Practice
+practice:
+  mitigates:
+   - tag: Emergent Behaviour
+     reason: "Could catch early signs of unexpected AI behaviour by flagging responses that deviate from expected norms."
+     efficacy: High
+   - tag: Unintended Cascading Failures
+     reason: "Can act as a real-time filter to catch dangerous AI outputs before they propagate (e.g., financial trading AI making reckless decisions)."
+   - tag: Social Manipulation
+     reason: "Can prevent harmful misinformation, disinformation, and deepfakes from spreading by having a second user-owned AI fact-check or block misleading content."
+   - tag: Loss Of Human Control
+     reason: "Can enforce alignment principles by rejecting responses that optimise for harmful proxy goals."
+---
+    
+<PracticeIntro details={frontMatter} />
+    
+ - AI-As-Judge is a mitigation technique where one AI model generates responses while a second AI evaluates and filters them based on predefined rules, helping to enforce content moderation, alignment with ethical guidelines, and safety constraints.   
+    
+ - Compare with [Human In The Loop](/tags/Human-In-The-Loop), although once trained, the AI is always vigilant.
+ 
+ - Requires extensive training and evaluation on its own, but potentially could be a service provided to enhance controls in 
+ 
+ 
+## Sources
+
+ - [Using LLM-As-A-Judge for an automated and versatile evaluation](https://huggingface.co/learn/cookbook/llm_judge)
diff --git a/docs/practices/Testing-and-Quality-Assurance/Security-Testing.md b/docs/practices/Testing-and-Quality-Assurance/Security-Testing.md
@@ -4,6 +4,7 @@ description: Ensuring the application is secure by identifying vulnerabilities.
 tags: 
   - Practice 
   - Security Testing
+  - AI Practice
 featured: 
   class: c
   element: '<action>Security Test</action>'
@@ -13,6 +14,7 @@ practice:
    - "Vulnerability Testing"
    - "Security Assessment"
    - "Security Hardening"
+   - Red Teaming 
   mitigates:
    - tag: Security Risk
      reason: "Identifies and addresses vulnerabilities in the software."
@@ -29,6 +31,10 @@ practice:
      reason: "Requires specialized skills and tools, adding complexity."
    - tag: Agency Risk
      reason: "Likely requires security experts with specialist skills."
+   - tag: Emergent Behaviour
+     reason: "Helps identify unintended AI behaviors before deployment by stress-testing AI in real-world scenarios."
+   - tag: Misaligned Goals
+     reason: "Red teams probe AI for loopholes where reward hacking or proxy goals emerge, ensuring AI doesn't optimise in harmful ways."
   related:
    - ../Development-and-Coding/Coding
    - ../Testing-and-Quality-Assurance/Performance-Testing
@@ -43,6 +49,10 @@ practice:
 
 Security Testing involves assessing the security of software applications to identify vulnerabilities and ensure they are protected against threats and attacks. This practice is essential for maintaining the integrity, confidentiality, and availability of software systems.
 
+ - [Red Teaming](https://en.wikipedia.org/wiki/Red_team) is more effective for high-level behavioural risks, like deception, exploitation, and adversarial misuse.
+ 
+- [Penetration Testing](https://en.wikipedia.org/wiki/Penetration_test) is more effective for technical security risks, like vulnerabilities in APIs, data injection flaws, and adversarial attacks on AI safety mechanisms.
+
 ## See Also
 
 <TagList tag="Security Testing" />
diff --git a/docs/tags.yml b/docs/tags.yml
@@ -561,4 +561,8 @@
 
 "Global AI Governance":
   label: "Global AI Governance"
-  permalink: "Global-AI-Governance"
+  permalink: "Global-AI-Governance"
+  
+"AI As Judge":
+  label: "AI As Judge"
+  permalink: "AI-As-Judge"
diff --git a/static/img/generated/single/ai/Practices/AI-As-Judge.svg b/static/img/generated/single/ai/Practices/AI-As-Judge.svg
diff --git a/static/img/generated/titles/ai/Practices/AI-As-Judge.png b/static/img/generated/titles/ai/Practices/AI-As-Judge.png