Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some related work #1

Closed
DYR1 opened this issue Oct 10, 2024 · 3 comments
Closed

Some related work #1

DYR1 opened this issue Oct 10, 2024 · 3 comments
Labels
good first issue Good for newcomers

Comments

@DYR1
Copy link

DYR1 commented Oct 10, 2024

We recently proposed a MoGU framework that can be used as a post-alignment strategy to improve the security of LLMs, and this work has been accepted by NIPS 2024. (MoGU)

Besides, we recently released "Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning", which belongs to protecting the security of LLM during fine-tuning.

I believe that adding these works to your list will be of some help to the field of LLM security!

Best Regards

@huangtiansheng
Copy link
Member

huangtiansheng commented Oct 10, 2024

Thanks for the suggestion. I will update our list after reading your prominent works. Let's keep doing great research in this field!

Thank you!

@huangtiansheng huangtiansheng added the good first issue Good for newcomers label Oct 10, 2024
@huangtiansheng huangtiansheng pinned this issue Oct 10, 2024
@huangtiansheng
Copy link
Member

Hi,

We have updated our list.

The high-level idea of your ML-LR solution is to identify those safety-critical parameters and assign a small learning rate in fine-tuning stage. We have a post-fine-tuning stage defense Antidote, use a very similar high-level idea. Our idea is to identify those safety-critical parameters and remove them (sparsify to 0) after fine-tuning. Feel free to check out our Antidote (https://arxiv.org/abs/2408.09600) paper if you are interested.

Thanks again,
Tiansheng Huang

@DYR1
Copy link
Author

DYR1 commented Oct 11, 2024

Thank you for adding our work to the list! We are glad that we have focused on the same problem and are working on solving it. We will read this paper carefully and hope to have the opportunity to communicate in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants