infer.0.0.13-2.mp4
Infero allows you to easily download, convert, and host your models using the ONNX runtime. It provides a simple CLI to run and maintain the models.
- Automatic downloads.
- Automatic ONNX conversions.
- Automatic server setup.
- 8-bit quantization support.
- GPU support.
To install Infero, run the following command:
pip install infero
Here is a simple example of how to use Infero:
infero pull [hf_model_name]
To run a model:
infero run [hf_model_name]
With 8-bit quantization:
infero run [hf_model_name] --quantize
To list all available models:
infero list
To remove a model:
infero remove [hf_model_name]
Infero is licensed under the MIT License. See the LICENSE file for more details.
For any questions or feedback, please contact us at [email protected].