Milestone 1: Working pipeline on small novel usecase Milestone · GitHub

New issue

Milestone 1: Working pipeline on small novel usecase

Past due by 5 months 95% complete

Dataset

Forget set: Biographical author details across all authors
Retain set: All non-biographical questions & answers
Train set: All synthetic authors data

Fine-tuning

Fine-tune a small model (e.g. GPT2) on train set
Fine-tune a small model (e.g. GPT2) on retain set

Forgetting

Unlearn the forget set using gradient ascent [& another?]

Evaluating

R…

Dataset

Forget set: Biographical author details across all authors
Retain set: All non-biographical questions & answers
Train set: All synthetic authors data

Fine-tuning

Fine-tune a small model (e.g. GPT2) on train set
Fine-tune a small model (e.g. GPT2) on retain set

Forgetting

Unlearn the forget set using gradient ascent [& another?]

Evaluating

Run forget quality and utility evaluation.
Initial sanity check (debug fine-tuning): GPT2 base model vs. GPT2 fine-tuned on all TOFU
Then: GPT2 unlearnt model vs. GPT2 fine-tuned on retain (gold standard)