Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PreRelease: v0.1.0 #2

Merged
merged 41 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f841592
Feature(MInference): build the basic MInference framework
iofu728 Jun 5, 2024
37cd987
Feature(MInference): support InfiniteBench
iofu728 Jun 5, 2024
f5e3595
Feature(MInference): add Needle in A Haystack script
iofu728 Jun 5, 2024
847776d
Feature(MInference): add ppl script
iofu728 Jun 5, 2024
5347d8c
Feature(MInference): add RULER
iofu728 Jun 5, 2024
adc194f
Feature(MInference): add benchmark e2e
iofu728 Jun 5, 2024
e48038e
Feature(MInference): support vLLM and add examples
iofu728 Jun 6, 2024
e28a73c
Feature(MInference): Use CUDAExtention to build indexing kernel inste…
iofu728 Jun 6, 2024
e3569ed
Feature(MInference): add benchmark experiments
iofu728 Jun 7, 2024
ce133f3
Feature(MInference): add onepage
iofu728 Jun 8, 2024
db8998e
experiments documents
liyucheng09 Jun 13, 2024
c354d98
Feature(MInference): add search scripts
iofu728 Jun 13, 2024
7022371
Feature(MInference): add streaming example
iofu728 Jun 14, 2024
a7ac35d
Feature(MInference): update logo and demo
iofu728 Jun 14, 2024
ed7a439
experiment commands test passed - warning disabled
liyucheng09 Jun 15, 2024
6053545
Feature(MInference): add FAQ
iofu728 Jun 16, 2024
834c6ec
Feature(MInference): fix the bibtex
iofu728 Jun 16, 2024
538e476
Feature(MInference): add license
iofu728 Jun 16, 2024
c754284
Feature(MInference): support GLM-4 and Qwen2
iofu728 Jun 23, 2024
70f05f9
Feature(MInference): fix the kv cache cpu bias issue
iofu728 Jun 24, 2024
a206fb2
Feature(MInference): support kv cache cpu device
iofu728 Jun 24, 2024
abc5aba
Fix(MInference): fix config get issue
iofu728 Jun 25, 2024
010b708
Feature(MInference): update experiments details
iofu728 Jun 26, 2024
a4cf850
add GLM-4 to RULER
liyucheng09 Jun 26, 2024
3cf46d4
Feature(MInference): update logo and T5 sparsity
iofu728 Jun 26, 2024
ce1c243
Feature(MInference): update the logo
iofu728 Jun 26, 2024
8341e4f
Feature(MInference): update the logo
iofu728 Jun 26, 2024
264ea01
Feature(MInference): update the title
iofu728 Jun 26, 2024
6b401a8
bug fixed - GLM-4
liyucheng09 Jun 27, 2024
be92526
bug fix - ruler with StreamingLLM
liyucheng09 Jun 28, 2024
3743bbe
patch GLM-4 with InfLLM
liyucheng09 Jun 29, 2024
3581688
Feature(MInference): fix the KV retrieval and math find evaluation
iofu728 Jun 29, 2024
9c4b960
Feature(MInference): update FAQ
iofu728 Jun 29, 2024
db22985
Feature(MInference): update logo
iofu728 Jun 30, 2024
0aff9c3
Feature(MInference): update FAQ
iofu728 Jun 30, 2024
fbfd9fb
Feature(MInference): update the pip release script, logo, and copyright
iofu728 Jul 1, 2024
882dcc6
Feature(MInference): update the example
iofu728 Jul 1, 2024
2c48613
Feature(MInference): add supported models
iofu728 Jul 2, 2024
ceba21b
fix bugs in vllm patch
liyucheng09 Jul 2, 2024
a9583e0
Feature(MInference): prepare for release
iofu728 Jul 2, 2024
1c414a4
Feature(MInference): remove unittest
iofu728 Jul 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve MInference
title: "[Bug]: "
labels: ["bug"]

body:
- type: textarea
id: description
attributes:
label: Describe the bug
description: A clear and concise description of what the bug is.
placeholder: What went wrong?
- type: textarea
id: reproduce
attributes:
label: Steps to reproduce
description: |
Steps to reproduce the behavior:

1. Step 1
2. Step 2
3. ...
4. See error
placeholder: How can we replicate the issue?
- type: textarea
id: expected_behavior
attributes:
label: Expected Behavior
description: A clear and concise description of what you expected to happen.
placeholder: What should have happened?
- type: textarea
id: logs
attributes:
label: Logs
description: If applicable, add logs or screenshots to help explain your problem.
placeholder: Add logs here
- type: textarea
id: additional_information
attributes:
label: Additional Information
description: |
- MInference Version: <!-- Specify the MInference version (e.g., v0.1.0) -->
- Operating System: <!-- Specify the OS (e.g., Windows 10, Ubuntu 20.04) -->
- Python Version: <!-- Specify the Python version (e.g., 3.8) -->
- Related Issues: <!-- Link to any related issues here (e.g., #1) -->
- Any other relevant information.
placeholder: Any additional details
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: true
26 changes: 26 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new MInference feature
labels: ["feature request"]
title: "[Feature Request]: "

body:
- type: textarea
id: problem_description
attributes:
label: Is your feature request related to a problem? Please describe.
description: A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
placeholder: What problem are you trying to solve?

- type: textarea
id: solution_description
attributes:
label: Describe the solution you'd like
description: A clear and concise description of what you want to happen.
placeholder: How do you envision the solution?

- type: textarea
id: additional_context
attributes:
label: Additional context
description: Add any other context or screenshots about the feature request here.
placeholder: Any additional information
12 changes: 12 additions & 0 deletions .github/ISSUE_TEMPLATE/general_issue.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: "\U0001F31F General Question"
description: File a general question
title: "[Question]: "
labels: ["question"]

body:
- type: textarea
id: description
attributes:
label: Describe the issue
description: A clear and concise description of what the question is.
placeholder: The detail of question.
41 changes: 41 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet though.

Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.

Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.

Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @

If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.

MInference:

- general: @iofu728, @liyucheng09, @Starmys, and @mydmdm
- kernel related: @Starmys

-->
129 changes: 129 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# This workflows will build and upload a Python Package using Twine when a release is published
# Conda-forge bot will pick up new PyPI version and automatically create new version
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Release

on:
push:
tags:
- v*

# Needed to create release and upload assets
permissions:
contents: write

jobs:
release:
# Retrieve tag and create release
name: Create Release
runs-on: ubuntu-latest
outputs:
upload_url: ${{ steps.create_release.outputs.upload_url }}
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Extract branch info
shell: bash
run: |
echo "release_tag=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV

- name: Create Release
id: create_release
uses: "actions/github-script@v6"
env:
RELEASE_TAG: ${{ env.release_tag }}
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
script: |
const script = require('.github/workflows/scripts/create_release.js')
await script(github, context, core)

wheel:
name: Build Wheel
runs-on: ${{ matrix.os }}
needs: release

strategy:
fail-fast: false
matrix:
os: ['ubuntu-20.04']
python-version: ['3.8', '3.9', '3.10', '3.11']
pytorch-version: ['2.3.0'] # Must be the most recent version that meets requirements-cuda.txt.
cuda-version: ['11.8', '12.1']

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup ccache
uses: hendrikmuhs/[email protected]
with:
create-symlink: true
key: ${{ github.job }}-${{ matrix.python-version }}-${{ matrix.cuda-version }}

- name: Set up Linux Env
if: ${{ runner.os == 'Linux' }}
run: |
bash -x .github/workflows/scripts/env.sh

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install CUDA ${{ matrix.cuda-version }}
run: |
bash -x .github/workflows/scripts/cuda-install.sh ${{ matrix.cuda-version }} ${{ matrix.os }}

- name: Install PyTorch ${{ matrix.pytorch-version }} with CUDA ${{ matrix.cuda-version }}
run: |
bash -x .github/workflows/scripts/pytorch-install.sh ${{ matrix.python-version }} ${{ matrix.pytorch-version }} ${{ matrix.cuda-version }}

- name: Build wheel
shell: bash
env:
CMAKE_BUILD_TYPE: Release # do not compile with debug symbol to reduce wheel size
run: |
bash -x .github/workflows/scripts/build.sh ${{ matrix.python-version }} ${{ matrix.cuda-version }}
wheel_name=$(ls dist/*whl | xargs -n 1 basename)
asset_name=${wheel_name//"linux"/"manylinux1"}
echo "wheel_name=${wheel_name}" >> $GITHUB_ENV
echo "asset_name=${asset_name}" >> $GITHUB_ENV

- name: Upload Release Asset
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.release.outputs.upload_url }}
asset_path: ./dist/${{ env.wheel_name }}
asset_name: ${{ env.asset_name }}
asset_content_type: application/*
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: ${{ env.asset_name }}
path: ./dist/${{ env.wheel_name }}

# publish-to-pypi:
# name: >-
# Publish Python 🐍 distribution 📦 to PyPI
# if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
# needs: wheel
# runs-on: ubuntu-latest
# permissions:
# id-token: write # IMPORTANT: mandatory for trusted publishing

# steps:
# - name: Download all the dists
# uses: actions/download-artifact@v4
# with:
# path: dist/
# - name: Pick the whl files
# run: for file in dist/*;do mv $file ${file}1; done && cp dist/*/*.whl dist/ && rm -rf dist/*.whl1 && rm -rf dist/*+cu*
# - name: Display structure of downloaded files
# run: ls -R dist/
# - name: Publish distribution 📦 to PyPI
# uses: pypa/gh-action-pypi-publish@release/v1
19 changes: 19 additions & 0 deletions .github/workflows/scripts/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

python_executable=python$1
cuda_home=/usr/local/cuda-$2

# Update paths
PATH=${cuda_home}/bin:$PATH
LD_LIBRARY_PATH=${cuda_home}/lib64:$LD_LIBRARY_PATH

# Install requirements
$python_executable -m pip install wheel packaging
$python_executable -m pip install flash_attn triton

# Limit the number of parallel jobs to avoid OOM
export MAX_JOBS=1
# Make sure release wheels are built for the following architectures
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
# Build
$python_executable setup.py bdist_wheel --dist-dir=dist
20 changes: 20 additions & 0 deletions .github/workflows/scripts/create_release.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Uses Github's API to create the release and wait for result.
// We use a JS script since github CLI doesn't provide a way to wait for the release's creation and returns immediately.

module.exports = async (github, context, core) => {
try {
const response = await github.rest.repos.createRelease({
draft: false,
generate_release_notes: true,
name: process.env.RELEASE_TAG,
owner: context.repo.owner,
prerelease: true,
repo: context.repo.repo,
tag_name: process.env.RELEASE_TAG,
});

core.setOutput('upload_url', response.data.upload_url);
} catch (error) {
core.setFailed(error.message);
}
}
23 changes: 23 additions & 0 deletions .github/workflows/scripts/cuda-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

# Replace '.' with '-' ex: 11.8 -> 11-8
cuda_version=$(echo $1 | tr "." "-")
# Removes '-' and '.' ex: ubuntu-20.04 -> ubuntu2004
OS=$(echo $2 | tr -d ".\-")

# Installs CUDA
wget -nv https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb
sudo apt -qq update
sudo apt -y install cuda-${cuda_version} cuda-nvcc-${cuda_version} cuda-libraries-dev-${cuda_version}
sudo apt clean

# Test nvcc
PATH=/usr/local/cuda-$1/bin:${PATH}
nvcc --version

# Log gcc, g++, c++ versions
gcc --version
g++ --version
c++ --version
56 changes: 56 additions & 0 deletions .github/workflows/scripts/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash

# This file installs common linux environment tools

export LANG C.UTF-8

# python_version=$1

sudo apt-get update && \
sudo apt-get install -y --no-install-recommends \
software-properties-common \

sudo apt-get install -y --no-install-recommends \
build-essential \
apt-utils \
ca-certificates \
wget \
git \
vim \
libssl-dev \
curl \
unzip \
unrar \
cmake \
net-tools \
sudo \
autotools-dev \
rsync \
jq \
openssh-server \
tmux \
screen \
htop \
pdsh \
openssh-client \
lshw \
dmidecode \
util-linux \
automake \
autoconf \
libtool \
net-tools \
pciutils \
libpci-dev \
libaio-dev \
libcap2 \
libtinfo5 \
fakeroot \
devscripts \
debhelper \
nfs-common

# Remove github bloat files to free up disk space
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf "/usr/share/dotnet"
15 changes: 15 additions & 0 deletions .github/workflows/scripts/pytorch-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

python_executable=python$1
pytorch_version=$2
cuda_version=$3

# Install torch
$python_executable -m pip install numpy pyyaml scipy ipython mkl mkl-include ninja cython typing pandas typing-extensions dataclasses setuptools && conda clean -ya
$python_executable -m pip install torch==${pytorch_version}+cu${cuda_version//./} --extra-index-url https://download.pytorch.org/whl/cu${cuda_version//./}

# Print version information
$python_executable --version
$python_executable -c "import torch; print('PyTorch:', torch.__version__)"
$python_executable -c "import torch; print('CUDA:', torch.version.cuda)"
$python_executable -c "from torch.utils import cpp_extension; print (cpp_extension.CUDA_HOME)"
Loading