llamafile v0.8.14
llamafile lets you distribute and run LLMs with a single file
llamafile is a local LLM inference tool introduced by Mozilla Ocho in Nov 2023, which offers superior performance and binary portability to the stock installs of six OSes without needing to be installed. It features the best of llama.cpp and cosmopolitan libc while aiming to stay ahead of the curve by including the most cutting-edge performance and accuracy enhancements. What llamafile gives you is a fun web GUI chatbot, a turnkey OpenAI API compatible server, and a shell-scriptable CLI interface which together put you in control of artificial intelligence.
v0.8.14 changes
This release introduces our new CLI chatbot interface. It supports
multi-line input using triple quotes. It will syntax highlight Python,
C, C++, Java, and JavaScript code.
This chatbot is now the default mode of operation. When you launch
llamafile without any special arguments, the chatbot will be launched
in the foreground, and the server will be launched in the background.
You can use the --chat
and --server
flags to disambiguate this
behavior if you only want one of them.
- a384fd7 Create ollama inspired cli chatbot
- 63205ee Add syntax highlighting to chatbot
- 7b395be Introduce new --chat flag for chatbot
- 28e98b6 Show prompt loading progress in chatbot
- 4199dae Make chat+server hybrid the new default mode
The whisperfile server now lets you upload mp3/ogg/flac.
- 74dfd21 Rewrite audio file loader code
- 7517a5f whisperfile server: convert files without ffmpeg (#568)
Other improvements have been made.
- d617c0b Added vision support to api_like_OAI (#524)
- 726f6e8 Enable gpu support in llamafile-bench (#581)
- c7c4d65 Speed up KV in llamafile-bench
- 2c940da Make replace_all() have linear complexity
- fa4c4e7 Use bf16 kv cache when it's faster
- 20fe696 Upgrade to Cosmopolitan 3.9.4
- c44664b Always favor fp16 arithmetic in tinyBLAS
- 98eff09 Quantize TriLM models using Q2_K_S (#552)