You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently proposed a MoGU framework that can be used as a post-alignment strategy to improve the security of LLMs, and this work has been accepted by NIPS 2024. (MoGU)
The high-level idea of your ML-LR solution is to identify those safety-critical parameters and assign a small learning rate in fine-tuning stage. We have a post-fine-tuning stage defense Antidote, use a very similar high-level idea. Our idea is to identify those safety-critical parameters and remove them (sparsify to 0) after fine-tuning. Feel free to check out our Antidote (https://arxiv.org/abs/2408.09600) paper if you are interested.
Thank you for adding our work to the list! We are glad that we have focused on the same problem and are working on solving it. We will read this paper carefully and hope to have the opportunity to communicate in the future.
We recently proposed a MoGU framework that can be used as a post-alignment strategy to improve the security of LLMs, and this work has been accepted by NIPS 2024. (MoGU)
Besides, we recently released "Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning", which belongs to protecting the security of LLM during fine-tuning.
I believe that adding these works to your list will be of some help to the field of LLM security!
Best Regards
The text was updated successfully, but these errors were encountered: