Releases: xorbitsai/inference
v0.14.1
What's new in 0.14.1 (2024-08-09)
These are the changes in inference v0.14.1.
New features
- FEAT: support SenseVoice audio-to-text model by @qinxuye in #2008
- FEAT: support flux.1-schnell & flux.1-dev by @qinxuye in #2007
- FEAT: support kolors image model by @qinxuye in #2028
- FEAT: Add support for llama-3.1-instruct 405B model by @frostyplanet in #2025
- FEAT: Support CogVideoX video model by @codingl2k1 in #2049
- FEAT: Support MiniCPM-v-2_6 by @Minamiyama in #2031
Enhancements
- ENH: Improve internal server error by @codingl2k1 in #2009
- ENH: Add
stream
option in Benchmark by @Dawnfz-Lenfeng in #2038 - ENH: optimize availability of vLLM by @qinxuye in #2046
- ENH: [worker] Allow init supervisor_ref lazy by @frostyplanet in #1958
- ENH: optimize performance of sglang by @qinxuye in #2050
- REF: Mark
Deprecate
forprompt
,system_prompt
andchat_history
parameters inchat
client interface by @ChengjieLi28 in #2043
Bug fixes
- BUG: fix flexible model register in worker by @frostyplanet in #2011
- BUG: [UI] Fix the 'model_path' bug. by @yiboyasss in #2015
- BUG: fix custom embedding launch error by @amumu96 in #2016
Tests
- TST: Fix some dependency version issues by @ChengjieLi28 in #2042
Documentation
- DOC: Directly launch custom model by
model_path
by @ChengjieLi28 in #2047 - DOC: fix typo in README by @ArtificialZeng in #2048
Others
- CHORE: Increased frequency of issue processing by @ChengjieLi28 in #2024
New Contributors
- @ArtificialZeng made their first contribution in #2048
- @Dawnfz-Lenfeng made their first contribution in #2038
Full Changelog: v0.14.0...v0.14.1
v0.14.0.post1
What's new in 0.14.0.post1 (2024-08-05)
These are the changes in inference v0.14.0.post1.
Enhancements
- ENH: Improve internal server error by @codingl2k1 in #2009
Bug fixes
- BUG: fix flexible model register in worker by @frostyplanet in #2011
- BUG: [UI] Fix the 'model_path' bug. by @yiboyasss in #2015
- BUG: fix custom embedding launch error by @amumu96 in #2016
Full Changelog: v0.14.0...v0.14.0.post1
v0.14.0
What's new in 0.14.0 (2024-08-02)
These are the changes in inference v0.14.0.
New features
- FEAT: Supports model_path input when launching models by @Valdanitooooo in #1918
- FEAT: Support gte-Qwen2-7B-instruct and multi gpu deploy by @amumu96 in #1994
Enhancements
- ENH: Add support of sglang for llama 3 qwen 2 by @luweizheng in #1947
- ENH: add cache_limit_gb option for MLX by @qinxuye in #1954
- ENH: [benchmark] Add api-key support by @frostyplanet in #1961
- ENH: Support for Gemma 2 and Llama 3.1 Models for vllm & sglang by @vikrantrathore in #1929
- ENH: [K8s] worker log dir name by @ChengjieLi28 in #1997
- ENH: support image_to_image by @qinxuye in #1986
- REF: enable sglang by default by @qinxuye in #1953
Bug fixes
- BUG: Fix GLM chat by @codingl2k1 in #1966
- BUG: fix match for transformers from model registered by @qinxuye in #1955
- BUG: Load llama.so failed in docker image by @ChengjieLi28 in #1974
- BUG: [UI]Modifying 'model format' again resulted in an error message. by @yiboyasss in #1990
- BUG: fix loading multiple gguf parts by @qinxuye in #1987
Documentation
- DOC: ascend support by @qinxuye in #1978
- DOC: add CosyVoice doc by @qinxuye in #1980
- DOC: Documents for K8s by @ChengjieLi28 in #2004
New Contributors
- @vikrantrathore made their first contribution in #1929
- @Valdanitooooo made their first contribution in #1918
Full Changelog: v0.13.3...v0.14.0
v0.13.3
What's new in 0.13.3 (2024-07-26)
These are the changes in inference v0.13.3.
New features
- FEAT: GLM4 support stream tool call by @codingl2k1 in #1876
- FEAT: support csg-wukong-chat-v0.1 by @qinxuye in #1916
- FEAT: [UI]Add configuration for image and audio models. by @yiboyasss in #1922
- FEAT: support mistral-nemo-instruct by @qinxuye in #1936
- FEAT: CosyVoice speech by @codingl2k1 in #1881
- FEAT: add llama-3.1, llama-3.1-instruct by @Weaxs in #1932
- FEAT: support mistral-large-instruct by @qinxuye in #1944
- Feat: support for llama 3.1 for vllm by @Phoenix500526 in #1935
- FEAT: add rembg flexible model to remove background of image by @qinxuye in #1917
Enhancements
- ENH: added MLX for llama-3-instruct, codestral, Yi-1.5-chat, internlm2.5-chat by @qinxuye in #1908
- ENH: add gptq for llama-3-instruct by @Phoenix500526 in #1934
Bug fixes
Documentation
New Contributors
- @Phoenix500526 made their first contribution in #1934
Full Changelog: v0.13.2...v0.13.3
v0.13.2
What's new in 0.13.2 (2024-07-19)
These are the changes in inference v0.13.2.
New features
- FEAT: support sd inpainting models by @qinxuye in #1879
- FEAT: Stream ChatTTS by @codingl2k1 in #1812
- FEAT: support codegeex4 by @qinxuye in #1888
- FEAT: support internlm2.5-chat & internlm2.5-chat-1m by @qinxuye in #1887
Enhancements
Bug fixes
- BUG: Fix stream unicode issue for chinese characters when using vllm backend by @ChengjieLi28 in #1865
- BUG: sglang stream error while stream_option not set by @wxiwnd in #1901
- BUG: fix client import by @amumu96 in #1905
Full Changelog: v0.13.1...v0.13.2
v0.13.1
What's new in 0.13.1 (2024-07-12)
These are the changes in inference v0.13.1.
New features
- FEAT: support choose download hub by @amumu96 in #1841
- FEAT: [UI] Specify download hub. by @yiboyasss in #1840
- FEAT: Add support for Flexible Model by @shellc in #1671
Enhancements
- ENH: Update ChatTTS by @codingl2k1 in #1776
- ENH: Added the parameter 'worker_ip' to the 'register' model. by @hainaweiben in #1773
- REF: Remove
chatglm-cpp
and Fix latestllama-cpp-python
issue by @ChengjieLi28 in #1844
Bug fixes
Documentation
Others
- FIX: [UI] Historical parameter echo bugs. by @yiboyasss in #1810
- FIX: [UI] Fix download_hub bugs. by @yiboyasss in #1846
- CHORE: Close issue when it is stale by @ChengjieLi28 in #1827
- CHORE: Update issue template by @ChengjieLi28 in #1833
New Contributors
Full Changelog: v0.13.0...v0.13.1
v0.13.0
What's new in 0.13.0 (2024-07-05)
These are the changes in inference v0.13.0.
New features
Enhancements
- ENH: added gguf files for qwen2 by @qinxuye in #1745
- ENH: Add more log modules by @ChengjieLi28 in #1771
- ENH: Continuous batching supports
vision
model ability by @ChengjieLi28 in #1724 - ENH: Add guard for model launching by @frostyplanet in #1680
- BLD: Supports Aliyun docker image by @ChengjieLi28 in #1753
- BLD: GPU docker use
vllm
image as base by @ChengjieLi28 in #1759 - BLD: Pin
llama-cpp-python
tov0.2.77
in Docker for stability by @ChengjieLi28 in #1767
Bug fixes
- BUG: Fix glm4 tool call by @codingl2k1 in #1747
- BUG: [UI] Fix authentication mode related bugs by @yiboyasss in #1772
- BUG: Fix python client returns documents for rerank task by default by @ChengjieLi28 in #1780
- BUG: Fix LLM based reranker may raise a TypeError by @codingl2k1 in #1794
- BUG: fix deepseek-vl-chat by @qinxuye in #1795
Tests
- TST: Fix
llama-cpp-python
issue in CI by @ChengjieLi28 in #1763
Documentation
- DOC: Update continuous batching and docker usage by @ChengjieLi28 in #1785
Full Changelog: v0.12.3...v0.13.0
v0.12.3
What's new in 0.12.3 (2024-06-28)
These are the changes in inference v0.12.3.
New features
- FEAT: [UI] Add favorite function. by @yiboyasss in #1714
- FEAT: add SD3 support by @qinxuye in #1723
- FEAT: [UI] Add the function of automatically obtaining the last configuration information. by @yiboyasss in #1730
- FEAT: support jina-rerank-v2 by @qinxuye in #1733
- FEAT:
tensorizer
integration by @Zihann73 in #1579 - FEAT: Delete cluster by @hainaweiben in #1719
Enhancements
- ENH: Set the CSG Hub endpoint as an environment variable. by @hainaweiben in #1666
- BLD: pin
chatglm-cpp
versionv0.3.x
by @ChengjieLi28 in #1692
Bug fixes
- BUG: [UI] Fix deleting prompt_style when Model Family is other. by @yiboyasss in #1707
- BUG: GGUF models cannot use GPU in docker by @ChengjieLi28 in #1710
- BUG: Fix tool call observation by @codingl2k1 in #1648
- BUG: [UI]fix favorite bug. by @yiboyasss in #1728
- BUG: curl with stream returns unicode chars rather than chinese character by @ChengjieLi28 in #1732
- BUG: Cluster info can be accessed without authorization in the auth mode by @ChengjieLi28 in #1731
Others
New Contributors
Full Changelog: v0.12.2...v0.12.3
v0.12.2.post1
What's new in 0.12.2.post1 (2024-06-22)
These are the changes in inference v0.12.2.post1.
Enhancements
- BLD: pin
chatglm-cpp
versionv0.3.x
by @ChengjieLi28 in #1692
Full Changelog: v0.12.2...v0.12.2.post1
v0.12.2
What's new in 0.12.2 (2024-06-21)
These are the changes in inference v0.12.2.
New features
- FEAT: Add Tools Support for Qwen Series MOE Models by @zhanghx0905 in #1642
- FEAT: [UI]Modify the deletion function of a custom model. by @yiboyasss in #1656
- FEAT: [UI]Custom model presents JSON data and modifies it. by @yiboyasss in #1670
- FEAT: Add Rerank model token input/output usage by @wxiwnd in #1657
Enhancements
- ENH: Continuous batching supports all the models with
transformers
backend by @ChengjieLi28 in #1659
Bug fixes
- BUG: show error when user launch quantized model without device supported by @Minamiyama in #1645
- BUG: Fix default rerank type by @codingl2k1 in #1649
- BUG: chat_completion not response while error appears more than 100 by @liuzhenghua in #1663
Tests
- TST: Fix CI due to
tenacity
by @ChengjieLi28 in #1660
Others
- CHORE: [pre-commit] Add exclude thirdparty rules by @frostyplanet in #1678
Full Changelog: v0.12.1...v0.12.2