Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llamafile 🦙📁 #871

Open
not-lain opened this issue Aug 27, 2024 · 2 comments · May be fixed by #1088
Open

add llamafile 🦙📁 #871

not-lain opened this issue Aug 27, 2024 · 2 comments · May be fixed by #1088

Comments

@not-lain
Copy link
Contributor

llamafile is a local app (similar to llama.cpp) to run llms in a distributed way from a single file

library can be used on both .gguf and .llamafile files

repo : https://github.com/Mozilla-Ocho/llamafile

snippets

linux and mac

wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven'

windows
(download and rename it using .exe)

curl -o Meta-Llama-3.1-8B.Q6_K.exe https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
.\Meta-Llama-3.1-8B.Q6_K.exe -p 'four score and seven'

gguf

wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.13/llamafile-0.8.13
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q6_K.gguf
chmod +x llamafile-0.8.13
./llamafile-0.8.13 -m tinyllama-1.1b-chat-v1.0.Q6_K.gguf -p 'four score and'

notes

  • The Windows example you might want to change to TinyLLaMA though in case that 8B model exceeds the Windows 4GB .exe file size limit. It's also possible to say .\llamafile-0.8.13 -m foo.llamafile to get around the limit (similar to GGUF snippet)
  • they do multi-model too with e.g. llava image processing. https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile That's their flagship model. If you just say ./llava-v1.5-7b-q4.llamafile it'll launch an HTTP server, open a tab in your desktop's browser, and you can chat with the model, upload an image file, ask it to analyze what it sees, etc.
  • binary has a different name for each library release
@not-lain
Copy link
Contributor Author

I found something interesting

If you go to https://api.github.com/repos/Mozilla-Ocho/llamafile/releases/latest and check the ["assets"][0] you will find the ["name"] as well as the ["browser_download_url"] there, which can be used to automatically update the snippets

@not-lain
Copy link
Contributor Author

not-lain commented Sep 28, 2024

made the following script to extract the llamafile download url from the release notes

async function getLatestLlamafileRelease() {
  const url =
    "https://api.github.com/repos/Mozilla-Ocho/llamafile/releases/latest";

  try {
    const response = await fetch(url);

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const data = await response.json();
    const tag = data["tag_name"];
    const assets = data["assets"];
    assets.forEach((asset) => {
      if (asset["name"] === `llamafile-${tag}`) {
        console.log(
          // the download url is in asset["browser_download_url"]
          `Download URL: ${asset["browser_download_url"]}`
        );
      }
    });
  } catch (error) {
    console.error("There was a problem fetching the data:", error);
  }
}

@not-lain not-lain linked a pull request Jan 5, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant