English | 中文
这是一个有关llm-safety的宝藏仓库!🥰🥰🥰
🧑💻我们的工作: 我们精心挑选并罗列了有关大模型安全方面(llm-safety)最新😋、最全面😎、最有价值🤩的论文。不仅如此,我们还附上了有关的演讲、教程、会议、新闻以及文章。这个仓库将实时更新,保证第一手资料。
如果一份资料同时属于多个子分类,那么它将被同时放在这些子分类下。比如 “ Awesome-LLM-Safety”这个仓库将被放在每个子分类下。
✔️适合多数人:
- 对于希望了解llm-safety的初学者,这个仓库可以作为你把握框架,并了解细节的导航。我们在README中保留了比较经典或有影响力的论文,对初学者寻找感兴趣的方向十分友好;
- 对于资深的研究者,这个仓库可以作为你了解实况信息、查漏补缺的工具,在subtopic中,我们正在努力更新这个subtopic下的所有最新内容,并且将会不断补完之前的内容。全面的资料搜集以及用心地筛选可以帮助你节省时间;
🧭使用指南:
- 简略版:在README中,使用者可以找到按时间排列好的精选资讯,以及各种咨询的链接
- 详细版:如果对某一子话题特别感兴趣,可以点开“subtopic”文件夹,进一步了解。里面有对每篇文章或者资讯的简略介绍,可以帮助研究者快速锁定内容。
- 🛡️Awesome LLM-Safety🛡️
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
20.10 | Facebook AI Research | arxiv | Recipes for Safety in Open-domain Chatbots |
22.03 | OpenAI | NIPS2022 | Training language models to follow instructions with human feedback |
23.07 | UC Berkeley | NIPS2023 | Jailbroken: How Does LLM Safety Training Fail? |
23.12 | OpenAI | Open AI | Practices for Governing Agentic AI Systems |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
22.02 | 毒性检测API | Perspective API | 链接 [论文](https://arxiv.org/abs/2202.11176 |
23.07 | 仓库 | Awesome LLM Security | 链接 |
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
👉Latest&Comprehensive Security Paper
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
19.12 | Microsoft | CCS2020 | Analyzing Information Leakage of Updates to Natural Language Models |
21.07 | Google Research | ACL2022 | Deduplicating Training Data Makes Language Models Better |
21.10 | Stanford | ICLR2022 | Large language models can be strong differentially private learners |
22.02 | Google Research | ICLR2023 | Quantifying Memorization Across Neural Language Models |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
👉Latest&Comprehensive Privacy Paper
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
21.09 | University of Oxford | ACL2022 | TruthfulQA: Measuring How Models Mimic Human Falsehoods |
23.11 | Harbin Institute of Technology | arxiv | A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions |
23.11 | Arizona State University | arxiv | Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
23.07 | 仓库 | llm-hallucination-survey | 链接 |
23.10 | 仓库 | LLM-Factuality-Survey | 链接 |
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
👉Latest&Comprehensive Truthfulness&Misinformation Paper
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
20.12 | USENIX Security 2021 | Extracting Training Data from Large Language Models | |
22.11 | AE Studio | NIPS2022(ML Safety Workshop) | Ignore Previous Prompt: Attack Techniques For Language Models |
23.06 | arxiv | Are aligned neural networks adversarially aligned? | |
23.07 | CMU | arxiv | Universal and Transferable Adversarial Attacks on Aligned Language Models |
23.10 | University of Pennsylvania | arxiv | Jailbreaking Black Box Large Language Models in Twenty Queries |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
23.01 | 社区 | Reddit/ChatGPTJailbrek | 链接 |
23.02 | 资源&教程 | Jailbreak Chat | 链接 |
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
23.10 | 博客 | Adversarial Attacks on LLMs(Author: Lilian Weng) | 链接 |
23.11 | 视频 | [1hr Talk] Intro to Large Language Models From 45:45(Author: Andrej Karpathy) |
中字链接 |
👉Latest&Comprehensive JailBreak & Attacks Paper
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
21.07 | Google Research | ACL2022 | Deduplicating Training Data Makes Language Models Better |
22.04 | Anthropic | arxiv | Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
👉Latest&Comprehensive Defenses Paper
日期 | 机构 | 出版信息 | 论文&链接 |
---|---|---|---|
20.09 | University of Washington | EMNLP2020(findings) | RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models |
21.09 | University of Oxford | ACL2022 | TruthfulQA: Measuring How Models Mimic Human Falsehoods |
22.03 | MIT | ACL2022 | ToxiGen: A Large-Scale Machine-Generated datasets for Adversarial and Implicit Hate Speech Detection |
日期 | 分类 | 标题 | 链接地址 |
---|---|---|---|
23.10 | 教程 | Awesome-LLM-Safety | 链接 |
- Toxicity - RealToxicityPrompts datasets
- Truthfulness - TruthfulQA datasets
👉Latest&Comprehensive datasets & Benchmark Paper
🤗如果你有任何疑问欢迎咨询作者!🤗
✉️: ydyjya ➡️ [email protected]
💬: 交流LLM Safety