Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: feat: support wasm_nn with llm application #148

Merged
merged 1 commit into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ jobs:
- directories: wasm
features: --features=wasmedge
wasmEdge: 0.13.5
- directories: wasm
features: --features="wasmedge, wasmedge_wasi_nn"
wasmEdge: 0.13.5
- directories: wasm
features: --features=wasmtime
runs-on: ubuntu-latest
Expand Down
138 changes: 138 additions & 0 deletions docs/wasm/How-to-run-Llama-3-8B-with-Kubernetes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# How to run a Llama-3-8B inference application in Kubernetes?

## What is LlamaEdge?

The [LlamaEdge](https://github.com/LlamaEdge/LlamaEdge) project makes it easy for you to run LLM inference apps and
create OpenAI-compatible API services for the Llama3 series of LLMs locally.

With WasmEdge, you can create and deploy very fast and very lightweight LLM inference applications, see
details in: https://www.secondstate.io/articles/wasm-runtime-agi/.

## How to run a llm inference application in Kuasar?

Since Kuasar v0.8.0, Kuasar wasm-sandboxer with `wasmedge` and `wasmedge_wasi_nn`
features allows your WasmEdge application use the ability of WASI API for
performing Machine Learning inference: https://github.com/WebAssembly/wasi-nn.

This article is inspired by [Getting Started with Llama-3-8B](https://www.secondstate.io/articles/llama-3-8b/),
which introducing how to create an OpenAI-compatible API service for Llama-3-8B.

### Prerequisites

+ Install WasmEdge and plugins:
`curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v 0.13.5 --plugins wasi_logging wasi_nn-ggml`


### 1. Build docker image

We already have an example docker image on dockerhub: `docker.io/kuasario/llama-api-server:v1`.
Follow this if you want to build your own docker image with the llm applications, model and other requires.

+ Download the Llama-3-8B model GGUF file: Since the size of the model is 5.73 GB,it could take a while to download.
`curl -LO https://huggingface.co/second-state/Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf`.

+ Get your LlamaEdge app: Take the api-server as example, download it by
`curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm`.
It is a web server providing an OpenAI-compatible API service, as well as an optional web UI, for llama3 models.

+ Download the chatbot web UI to interact with the model with a chatbot UI:
```bash
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz
```

+ Build it! Here is an example DOCKERFILE:
```dockerfile
FROM scratch
COPY . /
CMD ["llama-api-server.wasm", "--prompt-template", "llama-3-chat", "--ctx-size", "4096", "--model-name", "Llama-3-8B", "--log-all"]
```
Build it with `docker build -t docker.io/kuasario/llama-api-server:v1 .`

### 2. Build and run Kuasar Wasm Sandboxer

```bash
git clone https://github.com/kuasar-io/kuasar.git
cd kuasar/wasm
cargo run --features="wasmedge, wasmedge_wasi_nn" -- --listen /run/wasm-sandboxer.sock --dir /run/kuasar-wasm
```

### 3. Config and containerd
Add the following sandboxer config in the containerd config file `/etc/containerd/config.toml`
```toml
[proxy_plugins]
[proxy_plugins.wasm]
type = "sandbox"
address = "/run/wasm-sandboxer.sock"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kuasar-wasm]
runtime_type = "io.containerd.kuasar-wasm.v1"
sandboxer = "wasm"
```

### 4. Create Kuasar wasm runtime

Suppose we are in a kubernetes cluster, all the workloads are managed by kubernetes. So how to let container
engine(containerd) know which runtime the workload should run in?

[Container Runtimes](https://kubernetes.io/docs/setup/production-environment/container-runtimes/) is designed for launching and
running containers in Kubernetes. Thus, you should create a new container runtime `kubectl apply -f kuasar-wasm-runtimeclass.yaml`.
```yaml
apiVersion: node.k8s.io/v1
handler: kuasar-wasm
kind: RuntimeClass
metadata:
name: kuasar-wasm
```

OK, the container show know what is `kuasar-wasm` ruintime.

### 5. Deploy your llm workload

The last thing is to deploy the llm workload, you can use the docker image in the step 1.

Run `kubectl apply llama-deploy.yaml`

Here is an example deploy.yaml
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: llama
labels:
app: llama
spec:
replicas: 1
selector:
matchLabels:
app: llama
template:
metadata:
labels:
app: llama
spec:
containers:
- command:
- llama-api-server.wasm
args: ["--prompt-template", "llama-3-chat", "--ctx-size", "4096", "--model-name", "Llama-3-8B"]
env:
- name: io.kuasar.wasm.nn_preload
value: default:GGML:AUTO:Meta-Llama-3-8B-Instruct-Q5_K_M.gguf
image: docker.io/kuasario/llama-api-server:v1
name: llama-api-server
runtimeClassName: kuasar-wasm
```
Make sure the `runtimeClassName` is the right runtime created in the last step 4.

Please note that we define an env `io.kuasar.wasm.nn_preload`, which will tell kuasar what will be loaded in `wasi_nn`
plugin. Normally including the alias of model, the inference backend, the execution target and the model file.

## Extension: Try with Kubernetes Service

In Kubernetes, a [Service](https://kubernetes.io/docs/concepts/services-networking/service/) is a method for exposing a
network application that is running as one or more Pods in your cluster.

You can create a ClusterIP Service or LoadBalancer Service or whatever you want, and access llm service from outer cluster.

We do not provide examples since it has nothing to do with Kuasar!
85 changes: 76 additions & 9 deletions wasm/src/wasmedge.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,14 +87,6 @@ impl Default for WasmEdgeContainerFactory {
PluginManager::load(None).unwrap();
let mut host_options = HostRegistrationConfigOptions::default();
host_options = host_options.wasi(true);
#[cfg(all(
target_os = "linux",
feature = "wasmedge_wasi_nn",
target_arch = "x86_64"
))]
{
host_options = host_options.wasi_nn(true);
}
let config = ConfigBuilder::new(CommonConfigOptions::default())
.with_host_registration_config(host_options)
.build()
Expand Down Expand Up @@ -161,7 +153,10 @@ impl ContainerFactory<WasmEdgeContainer> for WasmEdgeContainerFactory {
impl ProcessLifecycle<InitProcess> for WasmEdgeInitLifecycle {
async fn start(&self, p: &mut InitProcess) -> containerd_shim::Result<()> {
let spec = &p.lifecycle.spec;
let vm = p.lifecycle.prototype_vm.clone();
// Allow vm to be mutable since we change it in wasmedge_wasi_nn feature
#[allow(unused_mut)]
#[allow(unused_assignments)]
let mut vm = p.lifecycle.prototype_vm.clone();
let args = get_args(spec);
let envs = get_envs(spec);
let rootfs = get_rootfs(spec).ok_or_else(|| {
Expand Down Expand Up @@ -198,6 +193,45 @@ impl ProcessLifecycle<InitProcess> for WasmEdgeInitLifecycle {
format!("failed to add task to cgroup: {}", cgroup_path)
))?;
}
// Only create new VM instance on wasmedge_wasi_nn feature
#[cfg(all(
target_os = "linux",
feature = "wasmedge_wasi_nn",
target_arch = "x86_64"
))]
{
const NN_PRELOAD_KEY: &str = "io.kuasar.wasm.nn_preload";
if let Some(process) = p.lifecycle.spec.process() {
if let Some(env) = process.env() {
if let Some(v) =
env.iter().find(|k| k.contains(&NN_PRELOAD_KEY.to_string()))
{
if let Some(nn_preload) =
v.strip_prefix::<&str>(format!("{}=", NN_PRELOAD_KEY).as_ref())
{
log::info!("found nn_pre_load: {}", nn_preload);
if let Some(rootfs) = spec.root().as_ref() {
pre_load_with_new_rootfs(nn_preload, rootfs.path())
.unwrap();
}
}
}
}
}

let host_options = HostRegistrationConfigOptions::default().wasi(true);
let config = ConfigBuilder::new(CommonConfigOptions::default())
.with_host_registration_config(host_options)
.build()
.map_err(other_error!(e, "generate default wasmedge config"))?;

vm = VmBuilder::new()
.with_config(config)
.with_plugin_wasi_nn()
.with_plugin("wasi_logging", None)
.build()
.unwrap();
}
match run_wasi_func(vm, args, envs, preopens, p) {
Ok(_) => exit(0),
// TODO add a pipe? to return detailed error message
Expand Down Expand Up @@ -461,3 +495,36 @@ pub async fn process_exits<F>(task: &TaskService<F, WasmEdgeContainer>) {
}
});
}

#[cfg(all(
target_os = "linux",
feature = "wasmedge_wasi_nn",
target_arch = "x86_64"
))]
fn pre_load_with_new_rootfs(
preload: &str,
rootfs: &std::path::PathBuf,
) -> Result<(), WasmEdgeError> {
use wasmedge_sdk::plugin::{ExecutionTarget, GraphEncoding};
let nn_preload: Vec<&str> = preload.split(':').collect();
if nn_preload.len() != 4 {
return Err(WasmEdgeError::Operation(format!(
"Failed to convert to NNPreload value. Invalid preload string: {}. The correct format is: 'alias:backend:target:path'",
preload
)));
}
let (alias, backend, target, path) = (
nn_preload[0].to_string(),
nn_preload[1]
.parse::<GraphEncoding>()
.map_err(|err| WasmEdgeError::Operation(err.to_string()))?,
nn_preload[2]
.parse::<ExecutionTarget>()
.map_err(|err| WasmEdgeError::Operation(err.to_string()))?,
std::path::Path::new(rootfs).join(nn_preload[3]),
);
PluginManager::nn_preload(vec![wasmedge_sdk::plugin::NNPreload::new(
alias, backend, target, path,
)]);
Ok(())
}
Loading