update readme

Tlntin · Oct 24, 2024 · 1a39899 · 1a39899
1 parent fb8640d
commit 1a39899
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 ### 说明
 - 本项目参考了[ascend-llm](https://gitee.com/yinghuo302/ascend-llm)项目。
 - 仅在昇腾310B1与910A上面做了测试，理论上也兼容其他昇腾芯片。
+- 可以用纯CPU运行pytorch或者onnx
 - 仅测试了qwen1.5-0.5b-chat与qwen2-1.5b-instruct模型，理论上支持qwen1.5/qwen2系列所有chat/instruct模型。
 - CANN环境安装可以参考[该教程](https://www.hiascend.com/forum/thread-0286155882998311250-1-1.html)，建议安装CANN 8.0RC2或者更高版本。
 - 如果你没有昇腾NPU设备，但是也想要体验一下试试，可以试试下面的免费云平台，注册即送50积分，可以体验25小时的昇腾910。
@@ -53,6 +54,19 @@
   # openai_function_call.py 测试function_call，该功能启用时建议增加max_input_length和kv_cache_length的长度。
   ```
 
+### （可选）验证模型结构
+- 在完成pytorch模型结构魔改后，需要验证一下模型是否正常。 
+- 验证pytorch CPU环境下，对话是否正常，该步骤主要是验证模型整体结构是否ok，可以多试几个demo，推荐试试`背诵《出师表》`
+  ```bash
+  python3 ./cli_chat.py \
+      --session_type="pytorch" \
+      --hf_model_dir="./download/Qwen2-1.5B-Instruct" \
+      --device_str="cpu" \
+      --dtype="float32" \
+      --torch_dtype="float32" \
+      --max_input_length=1024 \
+      --max_output_length=2048
+  ```
 
 ### 详细运行步骤
 ##### 步骤1：编译模型（以Qwen2-1.5B-Instruct）为例。

diff --git a/cli_chat.py b/cli_chat.py
@@ -44,7 +44,7 @@ def parser_args():
         default="cpu",
     )
     parser.add_argument(
-        "--cpu_thread" ,
+        "--cpu_thread",
         type=int,
         help="num of cpu thread when run onnx sesstion",
         default=4,

diff --git a/utils/session.py b/utils/session.py
@@ -70,7 +70,7 @@ def run(self, input_ids:np.ndarray, show_progress=False):
 
 
 class PyTorchSession(Session):
-    def __init__(self, config:InferenceConfig) -> None:
+    def __init__(self, config: InferenceConfig) -> None:
         super().__init__(config)
         self.kv_cache = create_kv_cache(config)
         from export.modeling_qwen2 import Qwen2ForCausalLM