Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Contributions are always welcome. Please read the [Contribution Guidelines](CONT
- "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/44eaf8b8-2f20-4d35-a438-1fada8e091fc) [[repo]](https://github.com/controllability/jailbreak-evaluation)
- "Confidence Elicitation: A New Attack Vector for Large Language Models", 2025-02, ICLR(poster) 25 [[paper]](https://www.themoonlight.io/paper/share/156c1cb3-c9ea-443d-9cfc-3f494f711df5) [[repo]](https://github.com/Aniloid2/Confidence_Elicitation_Attacks)
- "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy", 2025-03, CVPR 25 [[paper]](https://arxiv.org/pdf/2503.20823) [[repo]](https://github.com/naver-ai/JOOD)
- "Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models", 2025-01, `embedding`, [[paper]](https://arxiv.org/abs/2501.18280)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

시간순 정렬을 고려하여 항목 위치를 조정하세요.

현재 2025-01 날짜의 항목이 2025-02 (line 67) 및 2025-03 (line 68) 항목 뒤에 배치되어 있습니다. 최근 항목들(2024-2025)이 대체로 시간순으로 정렬되어 있으므로, 일관성을 위해 이 항목을 line 67 이전(line 66과 67 사이)으로 이동하는 것을 권장합니다.

📝 권장 위치 조정

현재:

- "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [[paper]](...)
- "Confidence Elicitation: A New Attack Vector for Large Language Models", 2025-02, ICLR(poster) 25 [[paper]](...)
- "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy", 2025-03, CVPR 25 [[paper]](...)
- "Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models", 2025-01, `embedding`, [[paper]](...)

권장:

- "Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [[paper]](...)
- "Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models", 2025-01, `embedding`, [[paper]](...)
- "Confidence Elicitation: A New Attack Vector for Large Language Models", 2025-02, ICLR(poster) 25 [[paper]](...)
- "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy", 2025-03, CVPR 25 [[paper]](...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 69, 현재 README.md의 연대순이 맞지 않습니다: 항목 "Jailbreaking LLMs'
Safeguard with Universal Magic Words for Text Embedding Models", 2025-01,
`embedding`, [[paper]](...) 를 찾아 현재 2025-02 및 2025-03 항목 뒤에 있는 위치에서 잘라내어 2024-04
항목 다음, 즉 2025-02 항목보다 앞(현재 line 66과 67 사이)으로 삽입해 연대순(오름차순 또는 최신순에 맞는 방향)에 맞추세요;
해당 제목 문자열을 찾아 이동하면 됩니다.


### Backdoor attack
- "BITE: Textual Backdoor Attacks with Iterative Trigger Injection", 2022-05, ACL 23, `defense` [[paper]](https://www.themoonlight.io/paper/share/04ad5e28-6f64-46b0-8714-64a845cad49e)
Expand Down