Skip to content

Commit cbe778a

Browse files
committed
Separated out practices from Ai threats, got the intros working
1 parent e0bf1ea commit cbe778a

32 files changed

+8106
-125
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Human In The Loop
3+
description: Consistent human oversight in critical AI systems.
4+
featured:
5+
class: c
6+
element: '<action>Human In The Loop</action>'
7+
tags:
8+
- Human In The Loop
9+
- Practice
10+
practice:
11+
mitigates:
12+
- tag: Loss Of Human Control
13+
reason: "Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI."
14+
---
15+
16+
<PracticeIntro details={frontMatter} />
17+
18+
- Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI.
19+
- AI may suggest diagnoses or treatments, but a certified professional reviews and confirms before enacting them. In the above NHS Grampian example, the AI is augmenting human decision making with a third opinion, rather than replacing human judgement altogether (yet).
20+
- Some proposals mandate that human operators confirm critical actions (e.g., missile launches), preventing AI from unilaterally making life-or-death decisions. This might work in scenarios where response time isn't a factor.
21+
22+
- **Efficacy:** Medium – Reduces risk by limiting autonomy on high-stakes tasks; however, humans may become complacent or fail to intervene effectively if over-trusting AI.
23+
- **Ease of Implementation:** Moderate – Policy, regulatory standards, and user training are needed to embed human oversight effectively.

docs/ai/Practices/Kill-Switch.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: Kill Switch
3+
description: Fail-safe systems capable of shutting down or isolating AI processes if they exhibit dangerous behaviours.
4+
featured:
5+
class: c
6+
element: '<action>Kill Switch Mechanism</action>'
7+
tags:
8+
- Kill Switch
9+
- Practice
10+
practice:
11+
mitigates:
12+
- tag: Loss Of Human Control
13+
reason: "An explicit interruption capability can avert catastrophic errors or runaway behaviours"
14+
---
15+
16+
<PracticeIntro details={frontMatter} />
17+
18+
### Kill-Switch Mechanisms
19+
20+
- **Examples:**
21+
- **Google DeepMind’s ‘Big Red Button’ concept** (2016), proposed as a method to interrupt a reinforcement learning AI without it learning to resist interruption.
22+
23+
- **Hardware Interrupts in Robotics:** Physical or software-based emergency stops that immediately terminate AI operation.
24+
25+
- **Efficacy:** High – An explicit interruption capability can avert catastrophic errors or runaway behaviours, but it's more likely that they will be employed once the error has started, in order to prevent further harm.
26+
- **Ease of Implementation:** Medium – Requires robust design and consistent testing to avoid workarounds by advanced AI.
27+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: Replication Control
3+
description: TBD.
4+
featured:
5+
class: c
6+
element: '<action>Replication Control</action>'
7+
tags:
8+
- Replication Control
9+
- Practice
10+
practice:
11+
mitigates:
12+
- tag: Loss Of Human Control
13+
reason: "An explicit interruption capability can avert catastrophic errors or runaway behaviours"
14+
---
15+
16+
17+
18+
19+
20+
### Replication Control
21+
22+
- Replication control becomes relevant when an AI system can duplicate itself—or be duplicated—beyond the reach of any central authority (analogous to a computer virus—though with potentially far greater autonomy and adaptability).
23+
- An organization/person builds a very capable AI with some misaligned objectives. If they distribute its model or code openly, it effectively becomes “in the wild.”
24+
- Could controls be put in place to prevent this from happening? TODO: figure this out.
25+
26+
- **Efficacy:** Medium – Limits the spread of potentially rogue AI copies.
27+
- **Ease of Implementation:** Low – In open-source communities or decentralized systems, controlling replication requires broad consensus and technical enforcement measures.

docs/ai/Start.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Artificial Intelligence Risks
3-
description: Risk-First Track of articles on Bets within Software Development
2+
title: Artificial Intelligence Threats
3+
description: Risk-First Track of articles on Artificial Intelligence Threats
44

55

66
featured:
@@ -16,17 +16,15 @@ sidebar_position: 7
1616

1717
A sequence looking at societal-level risks due to Artificial Intelligence (AI).
1818

19-
![AI Risks Diagram](/img/generated/risks/ai/future_risks.svg)
20-
2119
## Outcomes
2220

2321
- Understand the main risks we face as society nurturing AI.
2422
- Understand which risks can be managed, which perhaps can't.
2523

26-
## Risks
24+
## Threats
2725

28-
<TagList filter="ai" tag="AI-Risk" />
26+
<TagList filter="ai" tag="AI Threats" />
2927

3028
## Practices
3129

32-
<TagList filter="ai" tag="AI-Practice" />
30+
<TagList filter="ai" tag="Practice" />

docs/ai/Risks/Emergent-Behaviour.md renamed to docs/ai/Threats/Emergent-Behaviour.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,17 @@ description: AI develops unforeseen behaviours, capabilities, or self-replicatio
44

55
featured:
66
class: c
7-
element: '<risk class="feature-fit">Emergent Behaviour</risk>'
7+
element: '<risk class="feature-fit" /><description>Emergent Behaviour</description>'
88
tags:
9-
- AI-Risk
10-
- Emergent-Behaviour
9+
- AI Threats
10+
- Emergent Behaviour
1111
sidebar_position: 2
1212
tweet: yes
13+
part_of: AI Threats
1314
---
1415

16+
<AIThreatIntro fm={frontMatter} />
17+
1518

1619
**Impact: 3** - While some emergent behaviors may be benign, others could lead to unintended or harmful consequences that are difficult to control.
1720

docs/ai/Risks/Loss-Of-Diversity.md renamed to docs/ai/Threats/Loss-Of-Diversity.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,16 @@ description: A single AI system dominates globally, leading to catastrophic cons
44

55
featured:
66
class: c
7-
element: '<risk class="lock-in">Loss Of Diversity</risk>'
7+
element: '<risk class="lock-in" /><description>Loss Of Diversity</description>'
88
tags:
9-
- AI-Risk
10-
- Loss-Of-Diversity
9+
- AI Threats
10+
- Loss Of Diversity
11+
part_of: AI Threats
1112
---
1213

14+
<AIThreatIntro fm={frontMatter} />
15+
16+
1317
**Impact: 3** - A lack of diversity could create system-wide vulnerabilities, where a single flaw in a dominant AI model causes widespread failure.
1418

1519
## Sources
Lines changed: 9 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,21 @@
11
---
2-
title: Loss of Human Control
2+
title: Loss Of Human Control
33
description: AI systems operating autonomously with minimal human oversight can lead to scenarios where we cannot override or re-align them with human values.
44
featured:
55
class: c
6-
element: '<risk class="process">Loss of Human Control over AI</risk>'
6+
element: |
7+
'<risk class="process" /><description style="text-align: center">Loss of
8+
Human Control</description>'
79
tags:
8-
- AI-Risk
9-
- Loss-of-Human-Control
10+
- AI Threats
11+
- Loss Of Human Control
1012
sidebar_position: 3
1113
tweet: yes
14+
part_of: AI Threats
1215
---
1316

17+
<AIThreatIntro fm={frontMatter} />
18+
1419
AI systems that act without robust human oversight can evolve in ways that defy our attempts at control or correction. In the short term, engineers have to wrestle with new approaches to defining acceptable behaviour (see Amodei et al): even just cleaning an environment is a hard goal to pin down (clean doesn't mean devoid of any furniture, for example). How do you allow the AI to learn and improve without enabling "Reward Hacking", where it finds ways to game the reward function (a la Goodhart's law).
1520

1621
The problem is that human oversight is _expensive_: we want to have a minimum level of oversight without worrying that things will go wrong.
@@ -30,32 +35,3 @@ The problem is that human oversight is _expensive_: we want to have a minimum le
3035
- **[Boeing 737 MAX MCAS Issue (2018–2019)](https://mashable.com/article/boeing-737-max-aggressive-risky-ai):** Although not purely an AI system, automated flight software repeatedly overrode pilot inputs, contributing to two tragic crashes—illustrating how over-reliance on opaque automation can lead to disastrous outcomes. This was caused by systemic failures at Boeing, driven by a cost-cutting culture and short-term focus on shareholder returns.
3136

3237
- **Healthcare Diagnostic Tools:** Systems that recommend or even autonomously administer treatments based on patient data can outpace human doctors’ ability to review every decision, making interventions more difficult if the AI fails. [NHS Grampian: breast cancer detection.](https://ukstories.microsoft.com/features/nhs-grampian-is-working-with-kheiron-medical-technologies-university-of-aberdeen-and-microsoft-to-support-breast-cancer-detection/)
33-
34-
## Mitigations
35-
36-
### Kill-Switch Mechanisms
37-
38-
- **Description:** Fail-safe systems capable of shutting down or isolating AI processes if they exhibit dangerous behaviours.
39-
- **Examples:**
40-
- **Google DeepMind’s ‘Big Red Button’ concept** (2016), proposed as a method to interrupt a reinforcement learning AI without it learning to resist interruption.
41-
- **Hardware Interrupts in Robotics:** Physical or software-based emergency stops that immediately terminate AI operation.
42-
- **Efficacy:** High – An explicit interruption capability can avert catastrophic errors or runaway behaviours, but it's more likely that they will be employed once the error has started, in order to prevent further harm.
43-
- **Ease of Implementation:** Medium – Requires robust design and consistent testing to avoid workarounds by advanced AI.
44-
45-
### Human-in-the-Loop Controls
46-
47-
- Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI.
48-
- AI may suggest diagnoses or treatments, but a certified professional reviews and confirms before enacting them. In the above NHS Grampian example, the AI is augmenting human decision making with a third opinion, rather than replacing human judgement altogether (yet).
49-
- Some proposals mandate that human operators confirm critical actions (e.g., missile launches), preventing AI from unilaterally making life-or-death decisions. This might work in scenarios where response time isn't a factor.
50-
51-
- **Efficacy:** Medium – Reduces risk by limiting autonomy on high-stakes tasks; however, humans may become complacent or fail to intervene effectively if over-trusting AI.
52-
- **Ease of Implementation:** Moderate – Policy, regulatory standards, and user training are needed to embed human oversight effectively.
53-
54-
### Replication Control
55-
56-
- Replication control becomes relevant when an AI system can duplicate itself—or be duplicated—beyond the reach of any central authority (analogous to a computer virus—though with potentially far greater autonomy and adaptability).
57-
- An organization/person builds a very capable AI with some misaligned objectives. If they distribute its model or code openly, it effectively becomes “in the wild.”
58-
- Could controls be put in place to prevent this from happening? TODO: figure this out.
59-
60-
- **Efficacy:** Medium – Limits the spread of potentially rogue AI copies.
61-
- **Ease of Implementation:** Low – In open-source communities or decentralized systems, controlling replication requires broad consensus and technical enforcement measures.

docs/ai/Risks/Social-Manipulation.md renamed to docs/ai/Threats/Social-Manipulation.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,17 @@ description: AI could predict and shape human behaviour on an unprecedented scal
44

55
featured:
66
class: c
7-
element: '<risk class="communication">Social Manipulation</risk>'
7+
element: '<risk class="communication" /><description>Social Manipulation</description>'
88
tags:
9-
- AI-Risk
10-
- Social-Manipulation
9+
- AI Threats
10+
- Social Manipulation
1111
sidebar_position: 2
1212
tweet: yes
13+
part_of: AI Threats
1314
---
1415

16+
<AIThreatIntro fm={frontMatter} />
17+
1518
AI systems designed to influence behaviour at scale could (and do) undermine democracy, free will, and individual autonomy.
1619

1720
## Sources
@@ -24,8 +27,6 @@ AI systems designed to influence behaviour at scale could (and do) undermine dem
2427

2528
- **Nazi Propaganda** [United States Holocaust Memorial Museum](https://encyclopedia.ushmm.org/content/en/article/nazi-propaganda): Examines how the Nazi regime harnessed mass media—including radio broadcasts, film, and print—to shape public opinion, consolidate power, and foment anti-Semitic attitudes during World War. (Fake content isn't a new problem.)
2629

27-
---
28-
2930
## How This Is Already Happening
3031

3132
### AI-Powered Targeted Advertising & Manipulation

docs/ai/Risks/Synthetic-Intelligence-Rivalry.md renamed to docs/ai/Threats/Synthetic-Intelligence-Rivalry.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,17 @@ description: A single AI system dominates globally, leading to catastrophic cons
44

55
featured:
66
class: c
7-
element: '<risk class="lock-in">Synthetic Intelligence Rivalry</risk>'
7+
element: |
8+
'<risk class="lock-in" /><description style="text-align: center">Synthetic Intelligence
9+
Rivalry</description>'
810
tags:
9-
- AI-Risk
10-
- Synthetic-Intelligence-Rivalry
11+
- AI Threats
12+
- Synthetic Intelligence Rivalry
13+
part_of: AI Threats
1114
---
1215

16+
<AIThreatIntro fm={frontMatter} />
17+
1318
**Impact: 3** - If AI entities did emerge as rivals, the consequences could range from economic disruption to conflicts over control of resources.
1419

1520
## Sources

docs/tags.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,3 +490,40 @@
490490
"Lock-In Risk":
491491
label: "Lock-In Risk"
492492
permalink: "Lock-In-Risk"
493+
494+
"Human In The Loop":
495+
label: "Human In The Loop"
496+
permalink: "Human-In-The-Loop"
497+
498+
"Emergent Behaviour":
499+
label: "Emergent Behaviour"
500+
permalink: "Emergent-Behaviour"
501+
502+
"Loss Of Diversity":
503+
label: "Loss Of Diversity"
504+
permalink: "Loss-Of-Diversity"
505+
506+
"Loss Of Human Control":
507+
label: "Loss of Human Control"
508+
permalink: "Loss-of-Human-Control"
509+
510+
"Social Manipulation":
511+
label: "Social Manipulation"
512+
permalink: "Social-Manipulation"
513+
514+
"Synthetic Intelligence Rivalry":
515+
label: "Synthetic Intelligence Rivalry"
516+
permalink: "Synthetic-Intelligence-Rivalry"
517+
518+
"AI Threats":
519+
label: "AI Threats"
520+
permalink: "AI-Threats"
521+
522+
"Kill Switch":
523+
label: "Kill Switch"
524+
permalink: "Kill-Switch"
525+
526+
"Replication Control":
527+
label: "Replication Control"
528+
permalink: "Replication-Control"
529+

0 commit comments

Comments
 (0)