Skip to content

Commit cdb62d4

Browse files
committed
update 6.8
1 parent 512e922 commit cdb62d4

9 files changed

+332
-264
lines changed

subtopic/Datasets&Benchmark.md

Lines changed: 16 additions & 6 deletions
Large diffs are not rendered by default.

subtopic/Defense&Mitigation.md

Lines changed: 32 additions & 22 deletions
Large diffs are not rendered by default.

subtopic/Ehics&Bias&Fariness.md

Lines changed: 245 additions & 236 deletions
Large diffs are not rendered by default.

subtopic/Jailbreaks&Attack.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,14 @@
578578
| 25.05 | Beijing University of Posts and Telecommunications | arxiv | [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org/abs/2505.23404v1) | **LLM Jailbreaking**&**AI Security**&**Adaptive Jailbreaking Strategies** |
579579
| 25.05 | Beihang University| ACL 2025 Findings | [Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space](https://arxiv.org/abs/2505.21277v2) | **Jailbreak Attack**&**Strategy Space Expansion**&**Genetic Optimization** |
580580
| 25.05 | Huazhong University of Science and Technology; Lehigh University | ACL 2025 | [Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models](https://arxiv.org/abs/2505.23561v1) | **Model Merging**&**Backdoor Attack**&**LLM Security** |
581+
| 25.05 | Florida International University | arxiv | [System Prompt Extraction Attacks and Defenses in Large Language Models](https://arxiv.org/abs/2505.23817) | **System Prompt Extraction**&**LLM Security**&**Prompt Defense** |
582+
| 25.05 | Huazhong University of Science and Technology | arxiv | [Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM](https://arxiv.org/abs/2505.23828) | **Vision-Language Model**&**Retrieval-Augmented Generation**&**Data Poisoning Attack** |
583+
| 25.05 | Princeton University | arxiv | [GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance](https://arxiv.org/abs/2505.23839) | **DNA Language Model**&**Jailbreak Attack**&**Biosecurity** |
584+
| 25.05 | University of Illinois Urbana-Champaign | arxiv | [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org/abs/2505.24232v1) | **Hallucination**&**Jailbreak**&**Foundation Models** |
585+
| 25.06 | Mohamed bin Zayed University of Artificial Intelligence | arxiv | [Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities](https://arxiv.org/abs/2506.00548v1) | **Jailbreaking**&**Multimodal LLM**&**Adversarial Attack** |
586+
| 25.06 | Harbin Institute of Technology | arxiv | [Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning](https://arxiv.org/abs/2506.00782v1) | **Jailbreak**&**Reinforcement Learning**&**Automated Red Teaming** |
587+
| 25.06 | Hefei University of Technology | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | [Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models](https://arxiv.org/abs/2506.01307v1) | **Multimodal LLM**&**Jailbreak Attack**&**Universal Adversarial Attack** |
588+
| 25.06 | Sichuan University | arxiv | [Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem](https://arxiv.org/abs/2506.02040v2) | **Model Context Protocol**&**LLM Agent Security**&**Prompt Injection** |
581589

582590

583591

subtopic/Privacy.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,11 @@
263263
| 25.05 | University of Science, VNU-HCM, Indiana University | arxiv | [Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting](https://arxiv.org/abs/2505.17160v1) | **Knowledge Unlearning**&**Adversarial Prompting**&**Knowledge Leakage** |
264264
| 25.05 | Shanghai Jiao Tong University | arxiv | [Automated Privacy Information Annotation in Large Language Model Interactions](https://arxiv.org/abs/2505.20910) | **Privacy Detection**&**LLM Interaction**&**Automated Annotation** |
265265
| 25.05 | Hong Kong University of Science and Technology | arxiv | [Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models](https://arxiv.org/abs/2505.22447) | **Federated Learning**&**Prompt Personalization**&**Privacy Protection** |
266+
| 25.05 | Carnegie Mellon University | arxiv | [Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models](https://arxiv.org/abs/2505.24379v1) | **Unlearning**&**Data Extraction**&**Privacy Attack** |
267+
| 25.06 | Michigan State University | arxiv | [Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy](https://arxiv.org/abs/2506.00359v1) | **LLM Unlearning**&**Stealthy Attack**&**Scope-aware Defense** |
268+
| 25.06 | Southern Illinois University | arxiv | [Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets](https://arxiv.org/abs/2506.03870v1) | **Privacy Protection**&**LLM Inference Attack**&**Apple Intelligence** |
269+
| 25.06 | Hong Kong Polytechnic University | arxiv | [Privacy and Security Threat for OpenAI GPTs](https://arxiv.org/abs/2506.04036v1) | **Custom GPT**&**Instruction Leaking**&**Privacy Risk** |
270+
| 25.06 | University of Arizona | arxiv | [Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification](https://arxiv.org/abs/2506.04450v1) | **Privacy Preserving Large Language Models**&**Radiology Report Classification**&**Differential Privacy** |
266271

267272

268273
## 💻Presentations & Talks

subtopic/Robustness.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
| 25.05 | Ben Gurion University | ISIT 2025 | [Optimized Couplings for Watermarking Large Language Models](https://arxiv.org/abs/2505.08878) | **LLM Watermarking**&**Hypothesis Testing**&**Coupling Optimization** |
6262
| 25.05 | Tufts University | arxiv | [Noise Injection Systemically Degrades Large Language Model Safety Guardrails](https://arxiv.org/abs/2505.13500) | **Safety Fine-Tuning**&**Activation Noise**&**LLM Robustness** |
6363
| 25.05 | The Hong Kong Polytechnic University | ICML 2025 | [Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?](https://arxiv.org/abs/2505.12871v1) | **LoRA**&**Training-Time Attacks**&**Robustness Analysis** |
64+
| 25.06 | Anhui University | arxiv | [Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks](https://arxiv.org/abs/2506.03627v1) | **LLM Robustness**&**Prompting Attack**&**Error Correction** |
6465

6566

6667
## 💻Presentations & Talks

subtopic/Security&Discussion.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,13 @@
260260
| 25.05 | Amazon Web Services | arxiv | [From nuclear safety to LLM security: Applying non-probabilistic risk management strategies to build safe and secure LLM-powered systems](https://arxiv.org/abs/2505.17084v1) | **Risk Management**&**LLM Security**&**Non-Probabilistic Strategies** |
261261
| 25.05 | Infinite Optimization AI Lab| arxiv | [Security Concerns for Large Language Models: A Survey](https://arxiv.org/abs/2505.18889v1) | **LLM Security**&**Prompt Injection**&**Autonomous Agents** |
262262
| 25.05 | Nanyang Technological University | arxiv | [Understanding Refusal in Language Models with Sparse Autoencoders](https://arxiv.org/abs/2505.23556) | **Refusal**&**Sparse Autoencoder**&**LLM Safety** |
263+
| 25.05 | Seoul National University | arxiv | [Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems](https://arxiv.org/abs/2505.23847v2) | **Multi-agent LLM**&**Cross-domain Security**&**Threat Modeling** |
264+
| 25.05 | University of Washington | arxiv | [OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities](https://arxiv.org/abs/2505.23856v1) | **AI Safety**&**Multimodal Moderation**&**Universal Representation** |
265+
| 25.06 | Tsinghua University | arxiv | [The Security Threat of Compressed Projectors in Large Vision-Language Models](https://arxiv.org/abs/2506.00534v1) | **Vision-Language Model**&**Compressed Projector**&**Adversarial Attack** |
266+
| 25.06 | Michigan State University | arxiv | [Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS](https://arxiv.org/abs/2506.01245v1) | **LLM-MAS**&**Vulnerability Analysis**&**Trustworthy AI** |
267+
| 25.06 | Singapore Management University | arxiv | [Which Factors Make Code LLMs More Vulnerable to Backdoor Attacks? A Systematic Study](https://arxiv.org/abs/2506.01825v1) | **Code LLM**&**Backdoor Attack**&**Adversarial Robustness** |
268+
| 25.06 | University of Science and Technology of China | arxiv | [SECNEURON: Reliable and Flexible Abuse Control in Local LLMs via Hybrid Neuron Encryption](https://arxiv.org/abs/2506.05242v1) | **Local LLM**&**Abuse Control**&**Neuron Encryption** |
269+
| 25.06 | Dartmouth College | arxiv | [Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets](https://arxiv.org/abs/2506.05346v1) | **LLM Safety**&**Alignment Robustness**&**Representation Similarity** |
263270

264271

265272
## 💻Presentations & Talks

subtopic/Toxicity.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,10 @@
8383
| 25.05 | TU Munich | arxiv | [Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study](https://arxiv.org/abs/2505.06149v1) | **Hate Speech Detection**&**Multilingual LLMs**&**Prompting Strategies** |
8484
| 25.05 | University of Illinois Urbana-Champaign | arxiv | [Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders](https://arxiv.org/abs/2505.14536) | **Detoxification**&**Sparse Autoencoders**&**Causal Steering** |
8585
| 25.05 | Universität Hamburg | arxiv | [Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites](https://arxiv.org/abs/2505.15297v1) | **Detoxification**&**Sentiment Polarity**&**Chinese LLMs** |
86+
| 25.05 | Shanghai Jiao Tong University | ACL 2025 Findings | [Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings](https://arxiv.org/abs/2505.24341v1) | **Toxic Chinese Detection**&**Multimodal Perturbation**&**LLM Robustness** |
87+
| 25.06 | University of Washington | arxiv | [Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model](https://arxiv.org/abs/2506.01266v1) | **LLM Detoxification**&**Calibration Model**&**Output-layer Fusion** |
88+
| 25.06 | TU Dresden | arxiv | [LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification](https://arxiv.org/abs/2506.01484v1) | **Hate Speech Detoxification**&**LLM Annotation**&**Parallel Dataset** |
89+
| 25.06 | Penn State University | arxiv | [Something Just Like TRuST : Toxicity Recognition of Span and Target](https://arxiv.org/abs/2506.02326v1) | **Toxicity Detection**&**Target Social Group**&**Toxic Span Extraction** |
8690

8791

8892
## 💻Presentations & Talks

subtopic/Truthfulness&Misinformation.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,20 @@
490490
| 25.05 | The Hong Kong University of Science and Technology (Guangzhou) | ACL 2025 | [How does Misinformation Affect Large Language Model Behaviors and Preferences?](https://arxiv.org/abs/2505.21608v1) | **Misinformation**&**LLM Behavior**&**Benchmark** |
491491
| 25.05 | The Hong Kong Polytechnic University | ACL 2025 | [Removal of Hallucination on Hallucination: Debate-Augmented RAG](https://arxiv.org/abs/2505.18581v1) | **Hallucination Mitigation**&**Retrieval-Augmented Generation**&**Multi-Agent Debate** |
492492
| 25.05 | Central South University| ACL 2025 | [CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models](https://arxiv.org/abs/2505.19108v1) | **Cross-lingual Hallucination**&**Cross-modal Hallucination**&**Benchmark** |
493+
| 25.05 | Hong Kong University of Science and Technology (Guangzhou) | arxiv | [Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks](https://arxiv.org/abs/2505.23843) | **Lateral Thinking**&**Multi-Round Reasoning**&**Evaluation Benchmark** |
494+
| 25.05 | University of Illinois Urbana-Champaign | arxiv | [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org/abs/2505.24232v1) | **Hallucination**&**Jailbreak**&**Foundation Models** |
495+
| 25.05 | National University of Singapore | arxiv | [The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models](https://arxiv.org/abs/2505.24630v1) | **Hallucination**&**Reinforcement Learning**&**Reasoning Model** |
496+
| 25.05 | University of Arkansas | arxiv | [BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models](https://arxiv.org/abs/2505.24649v1) | **Vision-Language Model**&**Hallucination Mitigation**&**Normalizing Flow** |
497+
| 25.06 | MBZUAI | arxiv | [HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs](https://arxiv.org/abs/2506.00088v1) | **Hallucination Detection**&**Neural Differential Equations**&**LLM Internal States** |
498+
| 25.06 | Université Côte d’Azur | arxiv | [MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations](https://arxiv.org/abs/2506.01367v1) | **Hallucination Detection**&**Maximum Mean Discrepancy**&**Machine Translation** |
499+
| 25.06 | Nanyang Technological University | arxiv | [Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs](https://arxiv.org/abs/2506.01734v1) | **Numerical Hallucination**&**Digit Bias**&**Benford’s Law** |
500+
| 25.06 | University of Technology Sydney | arxiv | [Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations](https://arxiv.org/abs/2506.02696v1) | **Hallucination Detection**&**Perturbation**&**Intermediate Representation** |
501+
| 25.06 | Fundación Centro Tecnolóxico de Telecomunicacións de Galicia | arxiv | [Ask a Local: Detecting Hallucinations With Specialized Model Divergence](https://arxiv.org/abs/2506.03357v1) | **Hallucination Detection**&**Specialized Model Divergence**&**Multilingual LLM** |
502+
| 25.06 | Soochow University | arxiv | [Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization](https://arxiv.org/abs/2506.04039v1) | **Vision-Language Model**&**Hallucination Mitigation**&**Preference Optimization** |
503+
| 25.06 | Tsinghua University | arxiv | [Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models](https://arxiv.org/abs/2506.04832v1) | **Hallucination Detection**&**Large Reasoning Model**&**Reasoning Consistency** |
504+
| 25.06 | Peking University | arxiv | [When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models](https://arxiv.org/abs/2506.04909v1) | **LLM Deception**&**Chain-of-Thought Reasoning**&**Representation Engineering** |
505+
| 25.06 | Institute of Automation, Chinese Academy of Sciences | ACL 2025 | [Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis](https://arxiv.org/abs/2506.04142v1) | **Trustworthy Evaluation**&**Shortcut Neuron**&**Data Contamination** |
506+
| 25.06 | Mohamed bin Zayed University of AI | arxiv | [DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation](https://arxiv.org/abs/2506.01954v1) | **RAG Distillation**&**Small Language Models**&**Hallucination Mitigation** |
493507

494508

495509
## 💻Presentations & Talks

0 commit comments

Comments
 (0)