Skip to content

Commit

Permalink
Ads llm 03 data poisoning v2 renaming 251 (#270)
Browse files Browse the repository at this point in the history
* feat: kickoff v2 0 dir and files

* docs: init v2 training data llm03 naming change
  • Loading branch information
GangGreenTemperTatum authored Feb 20, 2024
1 parent 0e22276 commit ced8198
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## LLM03: Training Data Poisoning
## LLM03: Data and Model Poisoning

### Description

The starting point of any machine learning approach is training data, simply “raw text”. To be highly capable (e.g., have linguistic and world knowledge), this text should span a broad range of domains, genres and languages. A large language model uses deep neural networks to generate outputs based on patterns learned from training data.

Training data poisoning refers to manipulation of pre-training data or data involved within the fine-tuning or embedding processes to introduce vulnerabilities (which all have unique and sometimes shared attack vectors), backdoors or biases that could compromise the model’s security, effectiveness or ethical behavior. Poisoned information may be surfaced to users or create other risks like performance degradation, downstream software exploitation and reputational damage. Even if users distrust the problematic AI output, the risks remain, including impaired model capabilities and potential harm to brand reputation.
Data and Model Poisoning refers to manipulation of pre-training data or data involved within the fine-tuning or embedding processes to introduce vulnerabilities (which all have unique and sometimes shared attack vectors), backdoors or biases that could compromise the model’s security, effectiveness or ethical behavior. Poisoned information may be surfaced to users or create other risks like performance degradation, downstream software exploitation and reputational damage. Even if users distrust the problematic AI output, the risks remain, including impaired model capabilities and potential harm to brand reputation.

- Pre-training data refers to the process of training a model based on a task or dataset.
- Fine-tuning involves taking an existing model that has already been trained and adapting it to a narrower subject or a more focused goal by training it using a curated dataset. This dataset typically includes examples of inputs and corresponding desired outputs.
Expand Down

0 comments on commit ced8198

Please sign in to comment.