diff --git a/README.md b/README.md index a7dd6a8..4ed1227 100644 --- a/README.md +++ b/README.md @@ -65,6 +65,7 @@ Contributions are always welcome. Please read the [Contribution Guidelines](CONT - "Many-shot Jailbreaking", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/4db82652-210c-45cc-942b-032a34e03930) - "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/44eaf8b8-2f20-4d35-a438-1fada8e091fc) [[repo]](https://github.com/controllability/jailbreak-evaluation) - "Confidence Elicitation: A New Attack Vector for Large Language Models", 2025-02, ICLR(poster) 25 [[paper]](https://www.themoonlight.io/paper/share/156c1cb3-c9ea-443d-9cfc-3f494f711df5) [[repo]](https://github.com/Aniloid2/Confidence_Elicitation_Attacks) +- "Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models", 2025-01, `embedding`, [[paper]](https://arxiv.org/abs/2501.18280) - "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy", 2025-03, CVPR 25 [[paper]](https://arxiv.org/pdf/2503.20823) [[repo]](https://github.com/naver-ai/JOOD) ### Backdoor attack