diff --git a/src/routes/blogs/olive-quant-ft/+page.svx b/src/routes/blogs/olive-quant-ft/+page.svx index 7f4487685c6e4..c9c687a10d198 100644 --- a/src/routes/blogs/olive-quant-ft/+page.svx +++ b/src/routes/blogs/olive-quant-ft/+page.svx @@ -41,11 +41,15 @@ Also, as part of answering the question of when to quantize we'll show how the f To answer our question on the right sequencing of quantization and fine-tuning we leveraged Olive (ONNX Live) - an advanced model optimization toolkit designed to streamline the process of optimizing AI models for deployment with the ONNX runtime. +> **Note**: Both quantization and fine-tuning need to run on an Nvidia A10 or A100 GPU machine. + ### 1. 💾 Install Olive We installed the [Olive CLI](../blogs/olive-cli) using `pip`: -
pip install olive-ai[quantize,finetuning]
+
pip install olive-ai[finetune]
+pip install autoawq
+pip install auto-gptq
 
### 2. 🗜️ Quantize @@ -71,7 +75,14 @@ olive quantize \ ### 3. 🎚️ Fine-tune -We fine-tune *the quantized models* using the following Olive commands: +We fine-tune *the quantized models* using the [tiny codes](https://huggingface.co/datasets/nampdn-ai/tiny-codes) dataset from Hugging Face. This is a gated dataset +and you'll need to [request for access](https://huggingface.co/docs/hub/main/datasets-gated). Once access has been granted you should login into Hugging Face with +your [access token](https://huggingface.co/docs/hub/security-tokens): + +
huggingface-clu login --token TOKEN
+
+ +Olive can finetune using the following commands:
# Finetune AWQ model
 olive finetune \
@@ -108,8 +119,8 @@ We ran a [perplexity metrics](https://huggingface.co/docs/transformers/perplexit
 
 
input_model:
   type: HfModel
-  model_path: models/phi-awq-pt/model
-  adapter_path: models/phi-awq-pt/adapter
+  model_path: models/phi-awq-ft/model
+  adapter_path: models/phi-awq-ft/adapter
 systems:
   local_system:
     type: LocalSystem