From aca503338add63110792c2314f468f6643847131 Mon Sep 17 00:00:00 2001 From: Anh-Uong Date: Mon, 10 Jun 2024 10:20:14 -0600 Subject: [PATCH 1/2] bloom model can't run with flash-attn Signed-off-by: Anh-Uong --- examples/kfto-kueue-sft-trainer.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/examples/kfto-kueue-sft-trainer.yaml b/examples/kfto-kueue-sft-trainer.yaml index 146e9d27e..a8af49763 100644 --- a/examples/kfto-kueue-sft-trainer.yaml +++ b/examples/kfto-kueue-sft-trainer.yaml @@ -15,7 +15,8 @@ data: "gradient_accumulation_steps": 4, "learning_rate": 1e-05, "response_template": "\n### Label:", - "dataset_text_field": "output" + "dataset_text_field": "output", + "use_flash_attn": false } --- apiVersion: "kubeflow.org/v1" From fe43108cfb2563abb8a4380dc570c9d0ecea483c Mon Sep 17 00:00:00 2001 From: Sukriti Sharma Date: Mon, 10 Jun 2024 17:02:36 -0600 Subject: [PATCH 2/2] Update README.md for Lora modules (#174) Signed-off-by: Sukriti Sharma --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 86c4eccf3..e29f45440 100644 --- a/README.md +++ b/README.md @@ -287,6 +287,11 @@ For example for LLaMA model the modules look like: You can specify attention or linear layers. With the CLI, you can specify layers with `--target_modules "q_proj" "v_proj" "k_proj" "o_proj"` or `--target_modules "all-linear"`. +#### Recommended target modules per model architecture +As per [LoRA paper](https://arxiv.org/pdf/2106.09685), section 4.2 , by using the query and value projection matrices, we can achieve reasonable quality with efficient GPU utilization. Hence, while thinking about what LoRA adapters to specify, we recommend starting with query and value matrices. You could also refer to the defaults specified by PEFT library for popular model architectures in section [TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING](https://github.com/huggingface/peft/blob/7b1c08d2b5e13d3c99b7d6ee83eab90e1216d4ba/src/peft/utils/constants.py#L70) as a good starting point. + +_________________________ + ### Prompt Tuning: Specify `peft_method` to `'pt'` . You can additionally pass any arguments from [PromptTuningConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L63). @@ -446,4 +451,4 @@ The above runs several tasks with `hendrycksTest-*` being MMLU. [Prompt Tuning on Twitter Complaints](examples/prompt_tuning_twitter_complaints/README.md) -A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs. \ No newline at end of file +A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs.