diff --git a/README.md b/README.md index 86c4eccf3..e29f45440 100644 --- a/README.md +++ b/README.md @@ -287,6 +287,11 @@ For example for LLaMA model the modules look like: You can specify attention or linear layers. With the CLI, you can specify layers with `--target_modules "q_proj" "v_proj" "k_proj" "o_proj"` or `--target_modules "all-linear"`. +#### Recommended target modules per model architecture +As per [LoRA paper](https://arxiv.org/pdf/2106.09685), section 4.2 , by using the query and value projection matrices, we can achieve reasonable quality with efficient GPU utilization. Hence, while thinking about what LoRA adapters to specify, we recommend starting with query and value matrices. You could also refer to the defaults specified by PEFT library for popular model architectures in section [TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING](https://github.com/huggingface/peft/blob/7b1c08d2b5e13d3c99b7d6ee83eab90e1216d4ba/src/peft/utils/constants.py#L70) as a good starting point. + +_________________________ + ### Prompt Tuning: Specify `peft_method` to `'pt'` . You can additionally pass any arguments from [PromptTuningConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L63). @@ -446,4 +451,4 @@ The above runs several tasks with `hendrycksTest-*` being MMLU. [Prompt Tuning on Twitter Complaints](examples/prompt_tuning_twitter_complaints/README.md) -A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs. \ No newline at end of file +A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs. diff --git a/examples/kfto-kueue-sft-trainer.yaml b/examples/kfto-kueue-sft-trainer.yaml index 146e9d27e..a8af49763 100644 --- a/examples/kfto-kueue-sft-trainer.yaml +++ b/examples/kfto-kueue-sft-trainer.yaml @@ -15,7 +15,8 @@ data: "gradient_accumulation_steps": 4, "learning_rate": 1e-05, "response_template": "\n### Label:", - "dataset_text_field": "output" + "dataset_text_field": "output", + "use_flash_attn": false } --- apiVersion: "kubeflow.org/v1"