Some related work #1

DYR1 · 2024-10-10T02:27:39Z

We recently proposed a MoGU framework that can be used as a post-alignment strategy to improve the security of LLMs, and this work has been accepted by NIPS 2024. (MoGU)

Besides, we recently released "Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning", which belongs to protecting the security of LLM during fine-tuning.

I believe that adding these works to your list will be of some help to the field of LLM security!

Best Regards

huangtiansheng · 2024-10-10T02:42:30Z

Thanks for the suggestion. I will update our list after reading your prominent works. Let's keep doing great research in this field!

Thank you!

huangtiansheng · 2024-10-10T19:14:22Z

Hi,

We have updated our list.

The high-level idea of your ML-LR solution is to identify those safety-critical parameters and assign a small learning rate in fine-tuning stage. We have a post-fine-tuning stage defense Antidote, use a very similar high-level idea. Our idea is to identify those safety-critical parameters and remove them (sparsify to 0) after fine-tuning. Feel free to check out our Antidote (https://arxiv.org/abs/2408.09600) paper if you are interested.

Thanks again,
Tiansheng Huang

DYR1 · 2024-10-11T01:46:12Z

Thank you for adding our work to the list! We are glad that we have focused on the same problem and are working on solving it. We will read this paper carefully and hope to have the opportunity to communicate in the future.

huangtiansheng closed this as completed Oct 10, 2024

huangtiansheng added the good first issue Good for newcomers label Oct 10, 2024

huangtiansheng pinned this issue Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some related work #1

Some related work #1

DYR1 commented Oct 10, 2024

huangtiansheng commented Oct 10, 2024 •

edited

Loading

huangtiansheng commented Oct 10, 2024

DYR1 commented Oct 11, 2024

Some related work #1

Some related work #1

Comments

DYR1 commented Oct 10, 2024

huangtiansheng commented Oct 10, 2024 • edited Loading

huangtiansheng commented Oct 10, 2024

DYR1 commented Oct 11, 2024

huangtiansheng commented Oct 10, 2024 •

edited

Loading