Skip to content
Change the repository type filter

All

    Repositories list

    • cuda-python

      Public
      CUDA Python: Performance meets Productivity
      Cython
      2363.1k20515Updated Jan 6, 2026Jan 6, 2026
    • JAX-Toolbox

      Public
      JAX-Toolbox
      Python
      683738039Updated Jan 6, 2026Jan 6, 2026
    • numba-cuda

      Public
      The CUDA target for Numba
      Python
      5323610429Updated Jan 6, 2026Jan 6, 2026
    • cccl

      Public
      CUDA Core Compute Libraries
      C++
      3142.1k1.1k206Updated Jan 6, 2026Jan 6, 2026
    • Megatron-LM

      Public
      Ongoing research training transformer models at scale
      Python
      3.5k15k332244Updated Jan 6, 2026Jan 6, 2026
    • cloudai

      Public
      CloudAI Benchmark Framework
      Python
      428016Updated Jan 6, 2026Jan 6, 2026
    • KAI-Scheduler

      Public
      KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
      Go
      1341.1k2462Updated Jan 6, 2026Jan 6, 2026
    • TileGym

      Public
      Helpful kernel tutorials and examples for tile-based GPU programming
      Python
      2954411Updated Jan 6, 2026Jan 6, 2026
    • spark-rapids-jni

      Public
      RAPIDS Accelerator JNI For Apache Spark
      Cuda
      7852847Updated Jan 6, 2026Jan 6, 2026
    • cutile-python

      Public
      cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
      Python
      931.8k225Updated Jan 6, 2026Jan 6, 2026
    • cuda-quantum

      Public
      C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
      C++
      31688140782Updated Jan 6, 2026Jan 6, 2026
    • TensorRT-LLM

      Public
      TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
      Python
      2k13k518443Updated Jan 6, 2026Jan 6, 2026
    • spark-rapids-examples

      Public
      A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
      Jupyter Notebook
      62164213Updated Jan 6, 2026Jan 6, 2026
    • torch-harmonics

      Public
      Differentiable signal processing on the sphere for PyTorch
      Jupyter Notebook
      6362045Updated Jan 6, 2026Jan 6, 2026
    • TransformerEngine

      Public
      A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
      Python
      5993.1k288105Updated Jan 6, 2026Jan 6, 2026
    • spark-rapids-tools

      Public
      User tools for Spark RAPIDS
      Scala
      47652631Updated Jan 6, 2026Jan 6, 2026
    • Fuser

      Public
      A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
      C++
      74368211222Updated Jan 6, 2026Jan 6, 2026
    • doca-platform

      Public
      DOCA Platform manages provisioning and service orchestration for Bluefield DPUs
      Go
      166600Updated Jan 6, 2026Jan 6, 2026
    • TensorRT-Incubator

      Public
      Experimental projects related to TensorRT
      MLIR
      221173715Updated Jan 6, 2026Jan 6, 2026
    • NVSentinel

      Public
      NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments
      Go
      311463113Updated Jan 6, 2026Jan 6, 2026
    • bmcweb

      Public
      A do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
      C++
      175500Updated Jan 6, 2026Jan 6, 2026
    • gpu-operator

      Public
      NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
      Go
      4322.5k9468Updated Jan 6, 2026Jan 6, 2026
    • trt-samples-for-hackathon-cn

      Public
      Simple samples for TensorRT programming
      Python
      3521.6k652Updated Jan 6, 2026Jan 6, 2026
    • bionemo-framework

      Public
      BioNeMo Framework: For building and adapting AI models in drug discovery at scale
      Jupyter Notebook
      10761561112Updated Jan 6, 2026Jan 6, 2026
    • cuopt

      Public
      GPU accelerated decision optimization
      Cuda
      1056418430Updated Jan 6, 2026Jan 6, 2026
    • nv-ingest

      Public
      NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
      Python
      2832.8k10138Updated Jan 6, 2026Jan 6, 2026
    • nsight-python

      Public
      Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
      Python
      78754Updated Jan 6, 2026Jan 6, 2026
    • DLSS

      Public
      NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
      C
      971.1k02Updated Jan 6, 2026Jan 6, 2026
    • phosphor-host-postd

      Public
      C++
      3200Updated Jan 6, 2026Jan 6, 2026
    • OSMO

      Public
      The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML
      Python
      664219Updated Jan 6, 2026Jan 6, 2026