Skip to content
Change the repository type filter

All

    Repositories list

    • Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
      Python
      MIT License
      114000Updated Nov 5, 2024Nov 5, 2024
    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k8840Updated Nov 5, 2024Nov 5, 2024
    • Efficient Triton Kernels for LLM Training
      Python
      BSD 2-Clause "Simplified" License
      189000Updated Nov 2, 2024Nov 2, 2024
    • This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
      Shell
      Apache License 2.0
      0000Updated Oct 26, 2024Oct 26, 2024
    • ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
      Python
      BSD 3-Clause "New" or "Revised" License
      1.3k000Updated Oct 26, 2024Oct 26, 2024
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k000Updated Oct 23, 2024Oct 23, 2024
    • etalon

      Public
      LLM Serving Performance Evaluation Harness
      Python
      Apache License 2.0
      5000Updated Oct 17, 2024Oct 17, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      16001Updated Oct 14, 2024Oct 14, 2024
    • EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
      Python
      01962Updated Oct 6, 2024Oct 6, 2024
    • Go
      1000Updated Sep 26, 2024Sep 26, 2024
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      Apache License 2.0
      1731231Updated Sep 23, 2024Sep 23, 2024
    • PowerToys

      Public
      Windows system utilities to maximize productivity
      C#
      MIT License
      6.5k000Updated Aug 9, 2024Aug 9, 2024
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Jul 28, 2024Jul 28, 2024
    • Arena-Hard-Auto: An automatic LLM benchmark.
      Jupyter Notebook
      Apache License 2.0
      71000Updated Jul 15, 2024Jul 15, 2024
    • Python
      Apache License 2.0
      114000Updated Jul 11, 2024Jul 11, 2024
    • Python
      Apache License 2.0
      52000Updated Jul 9, 2024Jul 9, 2024
    • Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
      HTML
      Apache License 2.0
      742000Updated Jul 9, 2024Jul 9, 2024
    • workshop

      Public
      Jupyter Notebook
      0000Updated Jun 25, 2024Jun 25, 2024
    • ai-town

      Public
      A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
      TypeScript
      MIT License
      705000Updated Jun 23, 2024Jun 23, 2024
    • JamAI Base cookbook repo
      Python
      Apache License 2.0
      0400Updated Jun 10, 2024Jun 10, 2024
    • TypeScript
      1000Updated May 31, 2024May 31, 2024
    • TypeScript
      0100Updated May 31, 2024May 31, 2024
    • The 𝗣𝗼𝘄𝗲𝗿𝗳𝘂𝗹 Conversational AI JavaScript Library
      TypeScript
      Other
      63000Updated May 31, 2024May 31, 2024
    • Python
      Apache License 2.0
      1.1k500Updated Apr 22, 2024Apr 22, 2024
    • dspy

      Public
      DSPy: The framework for programming—not prompting—foundation models
      Python
      MIT License
      1.4k000Updated Apr 19, 2024Apr 19, 2024
    • Causal depthwise conv1d in CUDA, with a PyTorch interface
      Cuda
      BSD 3-Clause "New" or "Revised" License
      57000Updated Apr 12, 2024Apr 12, 2024
    • EAGLE

      Public
      EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation
      Python
      Apache License 2.0
      80000Updated Jan 30, 2024Jan 30, 2024
    • Python
      Apache License 2.0
      174000Updated Dec 13, 2023Dec 13, 2023
    • PyTorch bindings for CUTLASS grouped GEMM.
      Cuda
      Apache License 2.0
      38000Updated Dec 11, 2023Dec 11, 2023
    • Strip down to support flash attention v2 ROCM.
      Python
      Other
      611300Updated Nov 27, 2023Nov 27, 2023