Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.78 KB

File metadata and controls

22 lines (16 loc) · 1.78 KB

LLM10:2023 - Training Data Poisoning

Description:
Training data poisoning occurs when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model's security, effectiveness, or ethical behavior.

Common Training Data Poisoning Issues:

  • Introducing backdoors or vulnerabilities into the LLM through maliciously manipulated training data.
  • Injecting biases into the LLM, causing it to produce biased or inappropriate responses.
  • Exploiting the fine-tuning process to compromise the LLM's security or effectiveness.

How to Prevent:

  • Ensure the integrity of the training data by obtaining it from trusted sources and validating its quality.
  • Implement robust data sanitization and preprocessing techniques to remove potential vulnerabilities or biases from the training data.
  • Regularly review and audit the LLM's training data and fine-tuning procedures to detect potential issues or malicious manipulations.
  • Utilize monitoring and alerting mechanisms to detect unusual behavior or performance issues in the LLM, potentially indicating training data poisoning.

Example Attack Scenarios: Scenario #1: An attacker infiltrates the training data pipeline and injects malicious data, causing the LLM to produce harmful or inappropriate responses.

Scenario #2: A malicious insider compromises the fine-tuning process, introducing vulnerabilities or backdoors into the LLM that can be exploited at a later stage.

By ensuring the integrity of the training data, implementing robust data sanitization techniques, and regularly auditing the LLM's training and fine-tuning processes, developers can minimize the risk of training data poisoning and protect their LLMs from potential vulnerabilities.