EmbeddedLLM

All

32 repositories

infinity-executable
Public
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
Python
•
MIT License
•114•0•0•0•Updated Nov 5, 2024Nov 5, 2024
vllm
Public
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
inference pytorch transformer gpt amdgpu rocm model-serving llm llm-inference
Python
•
Apache License 2.0
•4.5k•88•4•0•Updated Nov 5, 2024Nov 5, 2024
Liger-Kernel
Public
Efficient Triton Kernels for LLM Training
Python
•
BSD 2-Clause "Simplified" License
•189•0•0•0•Updated Nov 2, 2024Nov 2, 2024
flash-attention-docker
Public
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
Shell
•
Apache License 2.0
•0•0•0•0•Updated Oct 26, 2024Oct 26, 2024
flash-attention-rocm
Public
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
Python
•
BSD 3-Clause "New" or "Revised" License
•1.3k•0•0•0•Updated Oct 26, 2024Oct 26, 2024
vllm-rocmfork
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•4.5k•0•0•0•Updated Oct 23, 2024Oct 23, 2024
etalon
Public
LLM Serving Performance Evaluation Harness
Python
•
Apache License 2.0
•5•0•0•0•Updated Oct 17, 2024Oct 17, 2024
unstructured-python-client
Public
A Python client for the Unstructured hosted API
Python
•
MIT License
•16•0•0•1•Updated Oct 14, 2024Oct 14, 2024
embeddedllm
Public
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
windows cpu llama gemma mistral directx-12 openvino npu openvino-inference-engine aipc
Python
•0•19•6•2•Updated Oct 6, 2024Oct 6, 2024
github-bot
Public
Go
•1•0•0•0•Updated Sep 26, 2024Sep 26, 2024
JamAIBase
Public
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
python workflow ai serverless chatbot spreadsheet svelte orchestration baas agents
Python
•
Apache License 2.0
•17•312•3•1•Updated Sep 23, 2024Sep 23, 2024
PowerToys
Public
Windows system utilities to maximize productivity
C#
•
MIT License
•6.5k•0•0•0•Updated Aug 9, 2024Aug 9, 2024
jamaibase-ts-docs
Public
Typescript Documentation of JamAISDK
HTML
•0•0•0•0•Updated Jul 28, 2024Jul 28, 2024
arena-hard-auto
Public
Arena-Hard-Auto: An automatic LLM benchmark.
Jupyter Notebook
•
Apache License 2.0
•71•0•0•0•Updated Jul 15, 2024Jul 15, 2024
unstructured-api-executable
Public
Python
•
Apache License 2.0
•114•0•0•0•Updated Jul 11, 2024Jul 11, 2024
unstructured-inference-executable
Public
Python
•
Apache License 2.0
•52•0•0•0•Updated Jul 9, 2024Jul 9, 2024
unstructured-executable
Public
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
HTML
•
Apache License 2.0
•742•0•0•0•Updated Jul 9, 2024Jul 9, 2024
workshop
Public
Jupyter Notebook
•0•0•0•0•Updated Jun 25, 2024Jun 25, 2024
ai-town
Public
A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
TypeScript
•
MIT License
•705•0•0•0•Updated Jun 23, 2024Jun 23, 2024
jamaibase-cookbook
Public
JamAI Base cookbook repo
Python
•
Apache License 2.0
•0•4•0•0•Updated Jun 10, 2024Jun 10, 2024
jamaibase-expressjs-vercel
Public
TypeScript
•1•0•0•0•Updated May 31, 2024May 31, 2024
jamaibase-nextjs-vercel
Public
TypeScript
•0•1•0•0•Updated May 31, 2024May 31, 2024
nlux-jamai
Public
The 𝗣𝗼𝘄𝗲𝗿𝗳𝘂𝗹 Conversational AI JavaScript Library
TypeScript
•
Other
•63•0•0•0•Updated May 31, 2024May 31, 2024
mamba-rocm
Public
Python
•
Apache License 2.0
•1.1k•5•0•0•Updated Apr 22, 2024Apr 22, 2024
dspy
Public
DSPy: The framework for programming—not prompting—foundation models
Python
•
MIT License
•1.4k•0•0•0•Updated Apr 19, 2024Apr 19, 2024
causal-conv1d-rocm
Public
Causal depthwise conv1d in CUDA, with a PyTorch interface
Cuda
•
BSD 3-Clause "New" or "Revised" License
•57•0•0•0•Updated Apr 12, 2024Apr 12, 2024
EAGLE
Public
EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation
Python
•
Apache License 2.0
•80•0•0•0•Updated Jan 30, 2024Jan 30, 2024
megablocks-rocm
Public
Python
•
Apache License 2.0
•174•0•0•0•Updated Dec 13, 2023Dec 13, 2023
grouped_gemm-rocm
Public
PyTorch bindings for CUTLASS grouped GEMM.
Cuda
•
Apache License 2.0
•38•0•0•0•Updated Dec 11, 2023Dec 11, 2023
xformers-rocm
Public
Strip down to support flash attention v2 ROCM.
Python
•
Other
•611•3•0•0•Updated Nov 27, 2023Nov 27, 2023