-
Notifications
You must be signed in to change notification settings - Fork 254
Description
Indirect prompt injection is an attack in which a command is included in a text that is used by someone else for processing by an LLM (e.g. to summarize the text). The command modifies or overrides the user's command. This is particularly problematic if the LLM can initiate actions (e.g. sending an email).
The attack is made possible by the practice that all input, i.e. the command of the user and the text which should be processed, is merged into one unified text that is given to the LLM, and the LLM cannot reliably recognise which part is the command and which is the text to be processed.
To mitigate indirect prompt injections, it is reasonable to enable the LLM to differentiate between the user's command and the text to be processed. Therefore, different (optional) channels should be defined for user input, e.g., command and text-to-process.
It was shown in the paper StruQ: Defending Against Prompt Injection with Structured Queries (https://www.usenix.org/conference/usenixsecurity25/presentation/chen-sizhe) that by fine-tuning an LLM to respect the channels, it is possible to make indirect prompt injections almost impossible for many use cases.
Furthermore, define additional user channels for sections of text coming from a vector database (for RAG), e.g., section-to-process, and for different media, e.g., image, and more.
The channel tag should include an optional parameter for the filename of the input data.
Frontends for LLMs that allow attaching files to a text in an input field (e.g., Open WebUI, https://github.com/open-webui/open-webui) could send the text from the input field to the command channel and the files to the text-to-process channel by default.