Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.7.0
What's new in 0.7.0 (2023-12-08)
These are the changes in inference v0.7.0.
Enhancements
- ENH: upgrade insecure requests when necessary by @waltcow in #712
- ENH: [UI] Using tab in running models by @ChengjieLi28 in #714
- ENH: [UI] supports launching rerank models by @ChengjieLi28 in #711
- ENH: [UI] Error can be shown on web UI directly via Snackbar by @ChengjieLi28 in #721
- ENH: [UI] Supports
n_gpu
config when launching LLM models on web ui by @ChengjieLi28 in #730 - ENH: [UI]
n_gpu
default valueauto
by @ChengjieLi28 in #738 - ENH: [UI] Support unregistering custom model on web UI by @ChengjieLi28 in #735
- ENH: Auto recover model actor by @codingl2k1 in #694
- ENH: allow rerank models run with LLM models on same device by @aresnow1 in #741
Bug fixes
- BUG: Auto patch trust remote code for embedding model by @codingl2k1 in #710
- BUG: Fix vLLM backend by @codingl2k1 in #728
Others
- Update builtin model list by @onesuper in #709
- Revert "ENH: upgrade insecure requests when necessary" by @qinxuye in #716
- CHORE: Format js file and check js code style by @ChengjieLi28 in #727
New Contributors
Full Changelog: v0.6.5...v0.7.0
v0.6.5
What's new in 0.6.5 (2023-12-01)
These are the changes in inference v0.6.5.
New features
- FEAT: Support jina embedding models by @aresnow1 in #704
- FEAT: Support Yi-chat by @aresnow1 in #700
- FEAT: Support qwen 72b by @aresnow1 in #705
- FEAT: ChatGLM3 tool calls by @codingl2k1 in #701
Enhancements
- ENH: Specify actor pool port for distributed deployment by @ChengjieLi28 in #688
- ENH: Remove
xorbits
dependency by @ChengjieLi28 in #699 - ENH: User can just specify a string for prompt style when registering custom LLM models by @ChengjieLi28 in #682
- ENH: Add more models supported by vllm by @aresnow1 in #706
Bug fixes
- BUG: Fix xinference start failed if invalid custom model found by @codingl2k1 in #690
Documentation
- Doc: Fix some incorrect links in documentation by @aresnow1 in #684
- Doc: Update readme by @aresnow1 in #687
- DOC: documentation for docker and k8s by @lynnleelhl in #661
Others
New Contributors
- @lynnleelhl made their first contribution in #661
Full Changelog: v0.6.4...v0.6.5
v0.6.4
What's new in 0.6.4 (2023-11-24)
These are the changes in inference v0.6.4.
New features
- FEAT: Support registering custom embedding model by @ChengjieLi28 in #667
- FEAT: Supports
qwen.cpp
forqwen-chat
withggml
format by @ChengjieLi28 in #675 - FEAT: Xverse by @fengsxy in #678
- FEAT: Support rerank models by @aresnow1 in #672
Enhancements
- ENH: Add
generate
interface forchatglm
withggml
format by @ChengjieLi28 in #671
Bug fixes
- BUG: Fix custom model missing config json by @codingl2k1 in #674
- BUG: Fix http error is not raised by @codingl2k1 in #657
- BUG: Fix pip install xinference[all] by @codingl2k1 in #679
Documentation
- DOC: update pot files by @UranusSeven in #638
- DOC: A more detailed beginner's guide has been created, covering various aspects of the first-time usage experience for new users. by @onesuper in #651
- DOC: documentation for using xinference by @fengsxy in #677
- DOC: Register custom embedding model by @ChengjieLi28 in #683
Others
- Add why xinf section to readme to compare pivitol features with others by @onesuper in #652
- Fix README.md by @aresnow1 in #669
New Contributors
Full Changelog: v0.6.3...v0.6.4
v0.6.3
What's new in 0.6.3 (2023-11-16)
These are the changes in inference v0.6.3.
New features
- FEAT: qwen-chat-14b by @UranusSeven in #494
- FEAT: Support gptq quantization by @codingl2k1 in #645
Bug fixes
- BUG: Fix restful api serialization slow by @codingl2k1 in #648
Tests
- TST: disable test_is_self_hosted by @UranusSeven in #641
Documentation
- DOC: About Logging in Xinference by @ChengjieLi28 in #631
- DOC: Init for Chinese doc by @ChengjieLi28 in #565
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's new in 0.6.2 (2023-11-09)
These are the changes in inference v0.6.2.
New features
- FEAT: Support Yi Model by @ChengjieLi28 in #629
Enhancements
- ENH: cache status by @UranusSeven in #616
- ENH: Supports request limits for the model by @ChengjieLi28 in #596
- ENH: running model location & accelerators by @UranusSeven in #626
- ENH: Create completion restful api compatibility by @codingl2k1 in #622
Bug fixes
- BUG: Compatible with openai 1.1 by @codingl2k1 in #619
- BUG: fix spec decoding by @UranusSeven in #628
- BUG:
No slot available
error for embedding and LLM model on one card by @ChengjieLi28 in #611 - BUG: Rotating log does not create a new one when recreate the xinference cluster by @ChengjieLi28 in #618
Documentation
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's new in 0.6.1 (2023-11-06)
These are the changes in inference v0.6.1.
New features
Enhancements
- ENH: add command xinference-local by @UranusSeven in #610
- ENH: Don't check dead nodes by @aresnow1 in #614
Full Changelog: v0.6.0...v0.6.1
v0.6.0
What's new in 0.6.0 (2023-11-03)
These are the changes in inference v0.6.0.
New features
- FEAT: Zephyr by @UranusSeven in #597
- FEAT: stable diffusion with controlnet by @codingl2k1 in #575
Enhancements
- ENH: increase heartbeat interval by @UranusSeven in #604
- ENH: Support more models downloading from modelscope by @aresnow1 in #595
- ENH: Supports rotating file log by @ChengjieLi28 in #590
- ENH: stateless supervisor and worker by @UranusSeven in #546
Bug fixes
- BUG: Fix chat system messages by @codingl2k1 in #594
- BUG: fix transformers compatibility by @UranusSeven in #600
Tests
- TST: Compatible with
llama-cpp-python
0.2.12 by @ChengjieLi28 in #603
Documentation
- DOC: Download model from ModelScope by @ChengjieLi28 in #553
- DOC: Stable Diffusion with ControlNet example by @codingl2k1 in #605
Full Changelog: v0.5.6...v0.6.0
v0.5.6
What's new in 0.5.6 (2023-10-30)
These are the changes in inference v0.5.6.
New features
- FEAT: launch embedding models by @Minamiyama in #582
- FEAT: chatglm3 by @UranusSeven in #587
Documentation
- DOC: update hot topics and fix docs by @UranusSeven in #584
Others
- CHORE: install setuptools in release actions by @aresnow1 in #588
- CHORE: Use python3.10 to build and release by @aresnow1 in #589
Full Changelog: v0.5.5...v0.5.6
v0.5.5
What's new in 0.5.5 (2023-10-26)
These are the changes in inference v0.5.5.
Enhancements
- ENH: display language tags by @Minamiyama in #558
- ENH: filter models by type by @Minamiyama in #559
- ENH: disable create embeddings using LLMs by @UranusSeven in #570
- ENH: benchmark latency by @UranusSeven in #576
- ENH: configurable
XINFERENCE_HOME
env by @ChengjieLi28 in #566
Bug fixes
- BUG: Fix
bge-base-zh
andbge-large-zh
from ModelScope by @ChengjieLi28 in #571 - BUG: When change model revision, xinference still uses the previous model by @ChengjieLi28 in #573
- BUG: incorrect vLLM config by @UranusSeven in #579
- BUG: fix llama-2 stop words by @UranusSeven in #580
Documentation
- DOC: Incompatibility Between NVIDIA Driver and PyTorch Version by @onesuper in #551
- DOC: Examples and resources page by @onesuper in #561
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's new in 0.5.4 (2023-10-20)
These are the changes in inference v0.5.4.
New features
- FEAT: wizardcoder python by @UranusSeven in #539
- FEAT: Support grammar-based sampling for ggml models by @aresnow1 in #525
- FEAT: speculative decoding by @UranusSeven in #509
Enhancements
- ENH: Download embedding models from ModelScope by @ChengjieLi28 in #532
- ENH: lock transformers version by @UranusSeven in #549
- ENH: Support downloading code-llama family models from ModelScope by @ChengjieLi28 in #557
- ENH: Add gguf format of codellama-instruct by @aresnow1 in #567
Bug fixes
- BUG: Fix stream not compatible with openai by @codingl2k1 in #524
- BUG: set trust_remote_code to true by default by @richzw in #555
- BUG: add quantization to valid file name by @richzw in #562
- BUG: remove "generate" ability from Baichuan-2-chat json config by @Minamiyama in #556
Documentation
- DOC: update pot files by @UranusSeven in #538
- DOC: Add Client API reference by @codingl2k1 in #543
- DOC: Add client doc to the user guide by @codingl2k1 in #547
New Contributors
- @richzw made their first contribution in #555
- @Minamiyama made their first contribution in #556
Full Changelog: v0.5.3...v0.5.4