Skip to content

Commit

Permalink
Migrate to web-rwkv v0.2
Browse files Browse the repository at this point in the history
  • Loading branch information
cryscan committed Aug 27, 2023
1 parent f92cafb commit a5ef311
Show file tree
Hide file tree
Showing 6 changed files with 212 additions and 137 deletions.
100 changes: 68 additions & 32 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "ai00_server"
version = "0.1.11"
version = "0.1.12"
edition = "2021"
authors = ["Gu ZhenNiu <[email protected]>", "Zhang Zhenyuan <[email protected]>"]
license = "MIT OR Apache-2.0"
Expand All @@ -18,7 +18,7 @@ axum = { git = "https://github.com/cryscan/axum", branch = "sse-leading-space" }
tower = { version = "0.4", features = ["util"] }
tower-http = { version = "0.4", features = ["full"] }
tokio = { version = "1", features = ["full"] }
web-rwkv = "0.1.18"
web-rwkv = "0.2.0"
memmap = "0.7"
regex = "1.8"
dialoguer = "0.10"
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,14 +97,14 @@ QQ Group for communication: 30920262
* `--tokenizer`: Tokenizer path
* `--port`: Running port
* `--quant`: Specify the number of quantization layers
* `--adaptor`: Adapter (GPU and backend) selection options
* `--adapter`: Adapter (GPU and backend) selection options

### Example

The server listens on port 3000, loads the full-layer quantized (32 > 24) 0.4B model, and selects adapter 0 (to get the specific adapter number, you can first not add this parameter, and the program will enter the adapter selection page).
The server listens on port 3000, loads the full-layer quantized (32 > 24) 0.4B model, and selects the high-performance adapter.

```bash
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adaptor 0
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adapter auto
```

## 📙Currently Available APIs
Expand Down
6 changes: 3 additions & 3 deletions README_jp.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,14 +95,14 @@ OpenAIのChatGPT APIインターフェースと互換性があります。
* `--tokenizer`: トークナイザーのパス
* `--port`: 実行ポート
* `--quant`: 量子化レイヤーの数を指定
* `--adaptor`: アダプター(GPUおよびバックエンド)の選択オプション
* `--adapter`: アダプター(GPUおよびバックエンド)の選択オプション

### 例

サーバーはポート3000でリッスンし、全レイヤー量子化(32 > 24)の0.4Bモデルをロードし、アダプター0を選択します(特定のアダプター番号を取得するには、最初にこのパラメーターを追加せず、プログラムがアダプター選択ページに入るまで待ちます)
サーバーはポート3000でリッスンし、全レイヤー量子化(32 > 24)の0.4Bモデルをロードし、高性能アダプターの自動選択

```bash
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adaptor 0
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adapter 0
```

## 📙現在利用可能なAPI
Expand Down
6 changes: 3 additions & 3 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,13 @@
- `--tokenizer`: 词表路径
- `--port`: 运行端口
- `--quant`: 指定量化层数
- `--adaptor`: 适配器(GPU和后端)选择项
- `--adapter`: 适配器(GPU和后端)选择项

### 示例

服务器监听3000端口,加载全部层量化(32 > 24)的0.4B模型,选择0号适配器(要查看具体适配器编号可以先不加该参数,程序会先进入选择页面)
服务器监听3000端口,加载全部层量化(32 > 24)的0.4B模型,自动选择高性能适配器
```bash
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adaptor 0
$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adapter auto
```


Expand Down
Loading

0 comments on commit a5ef311

Please sign in to comment.