Skip to content

Commit

Permalink
Feature(MInference): prepare for release
Browse files Browse the repository at this point in the history
  • Loading branch information
iofu728 committed Jul 2, 2024
1 parent ceba21b commit a9583e0
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 24 deletions.
40 changes: 20 additions & 20 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,23 +107,23 @@ jobs:
name: ${{ env.asset_name }}
path: ./dist/${{ env.wheel_name }}

publish-to-pypi:
name: >-
Publish Python 🐍 distribution 📦 to PyPI
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
needs: wheel
runs-on: ubuntu-latest
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing

steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
path: dist/
- name: Pick the whl files
run: for file in dist/*;do mv $file ${file}1; done && cp dist/*/*.whl dist/ && rm -rf dist/*.whl1
- name: Display structure of downloaded files
run: ls -R dist/
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
# publish-to-pypi:
# name: >-
# Publish Python 🐍 distribution 📦 to PyPI
# if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
# needs: wheel
# runs-on: ubuntu-latest
# permissions:
# id-token: write # IMPORTANT: mandatory for trusted publishing

# steps:
# - name: Download all the dists
# uses: actions/download-artifact@v4
# with:
# path: dist/
# - name: Pick the whl files
# run: for file in dist/*;do mv $file ${file}1; done && cp dist/*/*.whl dist/ && rm -rf dist/*.whl1 && rm -rf dist/*+cu*
# - name: Display structure of downloaded files
# run: ls -R dist/
# - name: Publish distribution 📦 to PyPI
# uses: pypa/gh-action-pypi-publish@release/v1
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
<p align="center">
<picture>
<img alt="vLLM" src="images/MInference_logo.png" width=70%>
<img alt="MInference" src="images/MInference_logo.png" width=70%>
</picture>
</p>

<h2 align="center">MInference: Million-Tokens Prompt Inference for Long-context LLMs</h2>

<p align="center">
| <a href="https://aka.ms/MInference"><b>Project Page</b></a> |
<a href="https://arxiv.org/abs/2406."><b>Paper</b></a> |
<a href="https://arxiv.org/abs/2407."><b>Paper</b></a> |
<a href="https://huggingface.co/spaces/microsoft/MInference"><b>HF Demo</b></a> |
</p>

Expand All @@ -21,7 +21,7 @@ https://github.com/microsoft/MInference/assets/30883354/52613efc-738f-4081-8367-

**MInference 1.0** leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a **10x speedup** for pre-filling on an A100 while maintaining accuracy.

- [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://arxiv.org/abs/2406.) (Under Review, ES-FoMo @ ICML'24)<br>
- [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://arxiv.org/abs/2407.) (Under Review, ES-FoMo @ ICML'24)<br>
_Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_


Expand Down
2 changes: 1 addition & 1 deletion minference/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
_PATCH = "0"
# This is mainly for nightly builds which have the suffix ".dev$DATE". See
# https://semver.org/#is-v123-a-semantic-version for the semantics.
_SUFFIX = "alpha.1"
_SUFFIX = ""

VERSION_SHORT = "{0}.{1}".format(_MAJOR, _MINOR)
VERSION = "{0}.{1}.{2}{3}".format(_MAJOR, _MINOR, _PATCH, _SUFFIX)

0 comments on commit a9583e0

Please sign in to comment.