Popular repositories Loading
-
ComfyUI-flash-attention-triton
ComfyUI-flash-attention-triton PublicA ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.
Python 2
-
exllama
exllama PublicForked from turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Python 1
-
text-generation-webui
text-generation-webui PublicForked from oobabooga/text-generation-webui
A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
Python
-
whisper.cpp
whisper.cpp PublicForked from ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
C
-
exllamav2
exllamav2 PublicForked from turboderp-org/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Python
-
llama.cpp
llama.cpp PublicForked from ggerganov/llama.cpp
Port of Facebook's LLaMA model in C/C++
C
If the problem persists, check the GitHub status page or contact support.