diff --git a/1_1_vulns/InsecureOutputHandling.md b/1_1_vulns/InsecureOutputHandling.md index fac1a6c9..f5428dc6 100644 --- a/1_1_vulns/InsecureOutputHandling.md +++ b/1_1_vulns/InsecureOutputHandling.md @@ -1,5 +1,8 @@ ## LLM02: Insecure Output Handling + +### Description + Insecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality. Insecure Output Handling differs from Overreliance in that it deals with LLM-generated outputs before they are passed downstream whereas Overreliance focuses on broader concerns around overdependence on the accuracy and appropriateness of LLM outputs. @@ -9,7 +12,7 @@ Successful exploitation of an Insecure Output Handling vulnerability can result The following conditions can increase the impact of this vulnerability: * The application grants the LLM privileges beyond what is intended for end users, enabling escalation of privileges or remote code execution. * The application is vulnerable to indirect prompt injection attacks, which could allow an attacker to gain privileged access to a target user's environment. -* 3rtd party plugins do not adequately validate inputs. +* 3rd party plugins do not adequately validate inputs. ### Common Examples of Vulnerability diff --git a/1_1_vulns/TrainingDataPoisoning.md b/1_1_vulns/TrainingDataPoisoning.md index a6b69cbb..4dc0711c 100644 --- a/1_1_vulns/TrainingDataPoisoning.md +++ b/1_1_vulns/TrainingDataPoisoning.md @@ -37,7 +37,7 @@ Data poisoning is considered an integrity attack because tampering with the trai 7. Testing and Detection, by measuring the loss during the training stage and analyzing trained models to detect signs of a poisoning attack by analyzing model behavior on specific test inputs. 1. Monitoring and alerting on number of skewed responses exceeding a threshold. 2. Use of a human loop to review responses and auditing. - 3. Implement dedicated LLM's to benchmark against undesired consequences and train other LLM's using [reinforcement learning techniques](https://wandb.ai/ayush-thakur/Intro-RLAIF/reports/An-Introduction-to-Training-LLMs-Using-Reinforcement-Learning-From-Human-Feedback-RLHF---VmlldzozMzYyNjcy). + 3. Implement dedicated LLMs to benchmark against undesired consequences and train other LLMs using [reinforcement learning techniques](https://wandb.ai/ayush-thakur/Intro-RLAIF/reports/An-Introduction-to-Training-LLMs-Using-Reinforcement-Learning-From-Human-Feedback-RLHF---VmlldzozMzYyNjcy). 4. Perform LLM-based [red team exercises](https://www.anthropic.com/index/red-teaming-language-models-to-reduce-harms-methods-scaling-behaviors-and-lessons-learned) or [LLM vulnerability scanning](https://github.com/leondz/garak) into the testing phases of the LLM's lifecycle. ### Example Attack Scenarios @@ -59,4 +59,4 @@ Data poisoning is considered an integrity attack because tampering with the trai 8. [FedMLSecurity:arXiv:2306.04959](https://arxiv.org/abs/2306.04959): **Arxiv White Paper** 9. [The poisoning of ChatGPT](https://softwarecrisis.dev/letters/the-poisoning-of-chatgpt/): **Software Crisis Blog** 10. [Poisoning Web-Scale Training Datasets - Nicholas Carlini | Stanford MLSys #75](https://www.youtube.com/watch?v=h9jf1ikcGyk): **YouTube Video** -11. [OWASP CycloneDX v1.5](https://cyclonedx.org/capabilities/mlbom/): **OWASP CycloneDX** \ No newline at end of file +11. [OWASP CycloneDX v1.5](https://cyclonedx.org/capabilities/mlbom/): **OWASP CycloneDX**