Skip to content

Commit

Permalink
chore: add additional prevention methods 205 (#234)
Browse files Browse the repository at this point in the history
  • Loading branch information
GangGreenTemperTatum authored Oct 30, 2023
1 parent 4a68839 commit 7c163b8
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions 1_1_vulns/TrainingDataPoisoning.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,13 @@ Data poisoning is considered an integrity attack because tampering with the trai
3. Verify your use-case for the LLM and the application it will integrate to. Craft different models via separate training data or fine-tuning for different use-cases to create a more granular and accurate generative AI output as per it's defined use-case.
4. Ensure sufficient sandboxing through network controls are present to prevent the model from scraping unintended data sources which could hinder the machine learning output.
5. Use strict vetting or input filters for specific training data or categories of data sources to control volume of falsified data. Data sanitization, with techniques such as statistical outlier detection and anomaly detection methods to detect and remove adversarial data from potentially being fed into the fine-tuning process.
6. Adversarial robustness techniques such as federated learning and constraints to minimize the effect of outliers or adversarial training to be vigorous against worst-case perturbations of the training data.
6. Elaborate control questions around the source and ownership of datasets to ensure that the model has not been poisoned, and adopt this culture into the "MLSecOps" cycle. Refer to available resources such as [The Foundation Model Transparency Index](https://crfm.stanford.edu/fmti/) or [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for example.
7. Use DVC ([Data Version Control](https://dvc.org/doc/user-guide/analytics) to tightly identify and track part of a dataset which may have been manipulated, deleted or added that has lead to poisoning.
8. Use Vector Database to add user-supplied information in aid to protect from poisoning other users and even fix in production without having to re-train a new model.
9. Adversarial robustness techniques such as federated learning and constraints to minimize the effect of outliers or adversarial training to be vigorous against worst-case perturbations of the training data.
1. An "MLSecOps" approach could be to include adversarial robustness to the training lifecycle with the auto poisoning technique.
2. An example repository of this would be [Autopoison](https://github.com/azshue/AutoPoison) testing, including both attacks such as Content Injection Attacks (“(attempting to promote a brand name in model responses”) and Refusal Attacks (“always making the model refuse to respond”) that can be accomplished with this approach.
7. Testing and Detection, by measuring the loss during the training stage and analyzing trained models to detect signs of a poisoning attack by analyzing model behavior on specific test inputs.
10. Testing and Detection, by measuring the loss during the training stage and analyzing trained models to detect signs of a poisoning attack by analyzing model behavior on specific test inputs.
1. Monitoring and alerting on number of skewed responses exceeding a threshold.
2. Use of a human loop to review responses and auditing.
3. Implement dedicated LLMs to benchmark against undesired consequences and train other LLMs using [reinforcement learning techniques](https://wandb.ai/ayush-thakur/Intro-RLAIF/reports/An-Introduction-to-Training-LLMs-Using-Reinforcement-Learning-From-Human-Feedback-RLHF---VmlldzozMzYyNjcy).
Expand Down

0 comments on commit 7c163b8

Please sign in to comment.