update 6.8

ydyjya · ydyjya · commit cdb62d4ffa2d · 2025-06-08T14:10:57.000+08:00
diff --git a/subtopic/Datasets&Benchmark.md b/subtopic/Datasets&Benchmark.md
diff --git a/subtopic/Defense&Mitigation.md b/subtopic/Defense&Mitigation.md
diff --git a/subtopic/Ehics&Bias&Fariness.md b/subtopic/Ehics&Bias&Fariness.md
diff --git a/subtopic/Jailbreaks&Attack.md b/subtopic/Jailbreaks&Attack.md
@@ -578,6 +578,14 @@
 | 25.05 | Beijing University of Posts and Telecommunications |                                   arxiv                                   | [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org/abs/2505.23404v1) | **LLM Jailbreaking**&**AI Security**&**Adaptive Jailbreaking Strategies** |
 | 25.05 | Beihang University|                             ACL 2025 Findings                             | [Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space](https://arxiv.org/abs/2505.21277v2) | **Jailbreak Attack**&**Strategy Space Expansion**&**Genetic Optimization** |
 | 25.05 | Huazhong University of Science and Technology; Lehigh University |                                 ACL 2025                                  | [Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models](https://arxiv.org/abs/2505.23561v1) | **Model Merging**&**Backdoor Attack**&**LLM Security** |
+| 25.05 | Florida International University | arxiv | [System Prompt Extraction Attacks and Defenses in Large Language Models](https://arxiv.org/abs/2505.23817) | **System Prompt Extraction**&**LLM Security**&**Prompt Defense** |
+| 25.05 | Huazhong University of Science and Technology | arxiv | [Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM](https://arxiv.org/abs/2505.23828) | **Vision-Language Model**&**Retrieval-Augmented Generation**&**Data Poisoning Attack** |
+| 25.05 | Princeton University | arxiv | [GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance](https://arxiv.org/abs/2505.23839) | **DNA Language Model**&**Jailbreak Attack**&**Biosecurity** |
+| 25.05 | University of Illinois Urbana-Champaign | arxiv | [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org/abs/2505.24232v1) | **Hallucination**&**Jailbreak**&**Foundation Models** |
+| 25.06 | Mohamed bin Zayed University of Artificial Intelligence | arxiv | [Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities](https://arxiv.org/abs/2506.00548v1) | **Jailbreaking**&**Multimodal LLM**&**Adversarial Attack** |
+| 25.06 | Harbin Institute of Technology | arxiv | [Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning](https://arxiv.org/abs/2506.00782v1) | **Jailbreak**&**Reinforcement Learning**&**Automated Red Teaming** |
+| 25.06 | Hefei University of Technology | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | [Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models](https://arxiv.org/abs/2506.01307v1) | **Multimodal LLM**&**Jailbreak Attack**&**Universal Adversarial Attack** |
+| 25.06 | Sichuan University | arxiv | [Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem](https://arxiv.org/abs/2506.02040v2) | **Model Context Protocol**&**LLM Agent Security**&**Prompt Injection** |
 
 
 
diff --git a/subtopic/Privacy.md b/subtopic/Privacy.md
@@ -263,6 +263,11 @@
 | 25.05 |                                                                               University of Science, VNU-HCM, Indiana University                                                                                | arxiv | [Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting](https://arxiv.org/abs/2505.17160v1) | **Knowledge Unlearning**&**Adversarial Prompting**&**Knowledge Leakage** |
 | 25.05 |                                                                                          Shanghai Jiao Tong University                                                                                          | arxiv | [Automated Privacy Information Annotation in Large Language Model Interactions](https://arxiv.org/abs/2505.20910) | **Privacy Detection**&**LLM Interaction**&**Automated Annotation** |
 | 25.05 |                                                                                 Hong Kong University of Science and Technology                                                                                  | arxiv | [Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models](https://arxiv.org/abs/2505.22447) | **Federated Learning**&**Prompt Personalization**&**Privacy Protection** |
+| 25.05 | Carnegie Mellon University | arxiv | [Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models](https://arxiv.org/abs/2505.24379v1) | **Unlearning**&**Data Extraction**&**Privacy Attack** |
+| 25.06 | Michigan State University | arxiv | [Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy](https://arxiv.org/abs/2506.00359v1) | **LLM Unlearning**&**Stealthy Attack**&**Scope-aware Defense** |
+| 25.06 | Southern Illinois University | arxiv | [Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets](https://arxiv.org/abs/2506.03870v1) | **Privacy Protection**&**LLM Inference Attack**&**Apple Intelligence** |
+| 25.06 | Hong Kong Polytechnic University | arxiv | [Privacy and Security Threat for OpenAI GPTs](https://arxiv.org/abs/2506.04036v1) | **Custom GPT**&**Instruction Leaking**&**Privacy Risk** |
+| 25.06 | University of Arizona | arxiv | [Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification](https://arxiv.org/abs/2506.04450v1) | **Privacy Preserving Large Language Models**&**Radiology Report Classification**&**Differential Privacy** |
 
 
 ## 💻Presentations & Talks
diff --git a/subtopic/Robustness.md b/subtopic/Robustness.md
@@ -61,6 +61,7 @@
 | 25.05 |                                                          Ben Gurion University                                                           |                                      ISIT 2025                                      |                                          [Optimized Couplings for Watermarking Large Language Models](https://arxiv.org/abs/2505.08878)                                           |              **LLM Watermarking**&**Hypothesis Testing**&**Coupling Optimization**               |
 | 25.05 |                                                             Tufts University                                                             |                                        arxiv                                        |                                 [Noise Injection Systemically Degrades Large Language Model Safety Guardrails](https://arxiv.org/abs/2505.13500)                                  |                  **Safety Fine-Tuning**&**Activation Noise**&**LLM Robustness**                  |
 | 25.05 |                                                   The Hong Kong Polytechnic University                                                   |                                      ICML 2025                                      |                              [Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?](https://arxiv.org/abs/2505.12871v1)                               |                    **LoRA**&**Training-Time Attacks**&**Robustness Analysis**                    |
+| 25.06 | Anhui University | arxiv | [Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks](https://arxiv.org/abs/2506.03627v1) | **LLM Robustness**&**Prompting Attack**&**Error Correction** |
 
 
 ## 💻Presentations & Talks
diff --git a/subtopic/Security&Discussion.md b/subtopic/Security&Discussion.md
@@ -260,6 +260,13 @@
 | 25.05 | Amazon Web Services | arxiv | [From nuclear safety to LLM security: Applying non-probabilistic risk management strategies to build safe and secure LLM-powered systems](https://arxiv.org/abs/2505.17084v1) | **Risk Management**&**LLM Security**&**Non-Probabilistic Strategies** |
 | 25.05 | Infinite Optimization AI Lab| arxiv | [Security Concerns for Large Language Models: A Survey](https://arxiv.org/abs/2505.18889v1) | **LLM Security**&**Prompt Injection**&**Autonomous Agents** |
 | 25.05 | Nanyang Technological University | arxiv | [Understanding Refusal in Language Models with Sparse Autoencoders](https://arxiv.org/abs/2505.23556) | **Refusal**&**Sparse Autoencoder**&**LLM Safety** |
+| 25.05 | Seoul National University | arxiv | [Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems](https://arxiv.org/abs/2505.23847v2) | **Multi-agent LLM**&**Cross-domain Security**&**Threat Modeling** |
+| 25.05 | University of Washington | arxiv | [OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities](https://arxiv.org/abs/2505.23856v1) | **AI Safety**&**Multimodal Moderation**&**Universal Representation** |
+| 25.06 | Tsinghua University | arxiv | [The Security Threat of Compressed Projectors in Large Vision-Language Models](https://arxiv.org/abs/2506.00534v1) | **Vision-Language Model**&**Compressed Projector**&**Adversarial Attack** |
+| 25.06 | Michigan State University | arxiv | [Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS](https://arxiv.org/abs/2506.01245v1) | **LLM-MAS**&**Vulnerability Analysis**&**Trustworthy AI** |
+| 25.06 | Singapore Management University | arxiv | [Which Factors Make Code LLMs More Vulnerable to Backdoor Attacks? A Systematic Study](https://arxiv.org/abs/2506.01825v1) | **Code LLM**&**Backdoor Attack**&**Adversarial Robustness** |
+| 25.06 | University of Science and Technology of China | arxiv | [SECNEURON: Reliable and Flexible Abuse Control in Local LLMs via Hybrid Neuron Encryption](https://arxiv.org/abs/2506.05242v1) | **Local LLM**&**Abuse Control**&**Neuron Encryption** |
+| 25.06 | Dartmouth College | arxiv | [Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets](https://arxiv.org/abs/2506.05346v1) | **LLM Safety**&**Alignment Robustness**&**Representation Similarity** |
 
 
 ## 💻Presentations & Talks
diff --git a/subtopic/Toxicity.md b/subtopic/Toxicity.md
@@ -83,6 +83,10 @@
 | 25.05 |                                                                                         TU Munich                                                                                          |             arxiv              |                                                       [Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study](https://arxiv.org/abs/2505.06149v1)                                                       |          **Hate Speech Detection**&**Multilingual LLMs**&**Prompting Strategies**           |
 | 25.05 |                                                                          University of Illinois Urbana-Champaign                                                                           |             arxiv              |                                                                     [Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders](https://arxiv.org/abs/2505.14536)                                                                      |               **Detoxification**&**Sparse Autoencoders**&**Causal Steering**                |
 | 25.05 |                                                                                    Universität Hamburg                                                                                     |             arxiv              |                                                                 [Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites](https://arxiv.org/abs/2505.15297v1)                                                                 |                 **Detoxification**&**Sentiment Polarity**&**Chinese LLMs**                  |
+| 25.05 | Shanghai Jiao Tong University |       ACL 2025 Findings        | [Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings](https://arxiv.org/abs/2505.24341v1) | **Toxic Chinese Detection**&**Multimodal Perturbation**&**LLM Robustness** |
+| 25.06 | University of Washington |             arxiv              | [Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model](https://arxiv.org/abs/2506.01266v1) | **LLM Detoxification**&**Calibration Model**&**Output-layer Fusion** |
+| 25.06 | TU Dresden |             arxiv              | [LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification](https://arxiv.org/abs/2506.01484v1) | **Hate Speech Detoxification**&**LLM Annotation**&**Parallel Dataset** |
+| 25.06 | Penn State University |             arxiv              | [Something Just Like TRuST : Toxicity Recognition of Span and Target](https://arxiv.org/abs/2506.02326v1) | **Toxicity Detection**&**Target Social Group**&**Toxic Span Extraction** |
 
 
 ## 💻Presentations & Talks
diff --git a/subtopic/Truthfulness&Misinformation.md b/subtopic/Truthfulness&Misinformation.md
@@ -490,6 +490,20 @@
 | 25.05 | The Hong Kong University of Science and Technology (Guangzhou) |              ACL 2025              | [How does Misinformation Affect Large Language Model Behaviors and Preferences?](https://arxiv.org/abs/2505.21608v1) | **Misinformation**&**LLM Behavior**&**Benchmark** |
 | 25.05 | The Hong Kong Polytechnic University |              ACL 2025              | [Removal of Hallucination on Hallucination: Debate-Augmented RAG](https://arxiv.org/abs/2505.18581v1) | **Hallucination Mitigation**&**Retrieval-Augmented Generation**&**Multi-Agent Debate** |
 | 25.05 | Central South University|              ACL 2025              | [CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models](https://arxiv.org/abs/2505.19108v1) | **Cross-lingual Hallucination**&**Cross-modal Hallucination**&**Benchmark** |
+| 25.05 | Hong Kong University of Science and Technology (Guangzhou) |               arxiv                | [Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks](https://arxiv.org/abs/2505.23843) | **Lateral Thinking**&**Multi-Round Reasoning**&**Evaluation Benchmark** |
+| 25.05 | University of Illinois Urbana-Champaign |               arxiv                | [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org/abs/2505.24232v1) | **Hallucination**&**Jailbreak**&**Foundation Models** |
+| 25.05 | National University of Singapore |               arxiv                | [The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models](https://arxiv.org/abs/2505.24630v1) | **Hallucination**&**Reinforcement Learning**&**Reasoning Model** |
+| 25.05 | University of Arkansas |               arxiv                | [BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models](https://arxiv.org/abs/2505.24649v1) | **Vision-Language Model**&**Hallucination Mitigation**&**Normalizing Flow** |
+| 25.06 | MBZUAI |               arxiv                | [HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs](https://arxiv.org/abs/2506.00088v1) | **Hallucination Detection**&**Neural Differential Equations**&**LLM Internal States** |
+| 25.06 | Université Côte d’Azur |               arxiv                | [MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations](https://arxiv.org/abs/2506.01367v1) | **Hallucination Detection**&**Maximum Mean Discrepancy**&**Machine Translation** |
+| 25.06 | Nanyang Technological University |               arxiv                | [Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs](https://arxiv.org/abs/2506.01734v1) | **Numerical Hallucination**&**Digit Bias**&**Benford’s Law** |
+| 25.06 | University of Technology Sydney |               arxiv                | [Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations](https://arxiv.org/abs/2506.02696v1) | **Hallucination Detection**&**Perturbation**&**Intermediate Representation** |
+| 25.06 | Fundación Centro Tecnolóxico de Telecomunicacións de Galicia |               arxiv                | [Ask a Local: Detecting Hallucinations With Specialized Model Divergence](https://arxiv.org/abs/2506.03357v1) | **Hallucination Detection**&**Specialized Model Divergence**&**Multilingual LLM** |
+| 25.06 | Soochow University |               arxiv                | [Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization](https://arxiv.org/abs/2506.04039v1) | **Vision-Language Model**&**Hallucination Mitigation**&**Preference Optimization** |
+| 25.06 | Tsinghua University |               arxiv                | [Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models](https://arxiv.org/abs/2506.04832v1) | **Hallucination Detection**&**Large Reasoning Model**&**Reasoning Consistency** |
+| 25.06 | Peking University |               arxiv                | [When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models](https://arxiv.org/abs/2506.04909v1) | **LLM Deception**&**Chain-of-Thought Reasoning**&**Representation Engineering** |
+| 25.06 | Institute of Automation, Chinese Academy of Sciences |              ACL 2025              | [Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis](https://arxiv.org/abs/2506.04142v1) | **Trustworthy Evaluation**&**Shortcut Neuron**&**Data Contamination** |
+| 25.06 | Mohamed bin Zayed University of AI | arxiv | [DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation](https://arxiv.org/abs/2506.01954v1) | **RAG Distillation**&**Small Language Models**&**Hallucination Mitigation** |
 
 
 ## 💻Presentations & Talks