ptoxide

ptoxide is a crate that allows NVIDIA CUDA PTX code to be executed on any machine. It was created as a project to learn more about the CUDA excution model.

Kernels are executed by compiling them to a custom bytecode format, which is then executed inside of a virtual machine.

To see how the library works in practice, check out the example below, and take a look at the integration tests in the tests directory.

Try running cargo run --example times_two to see it in action!

Supported Features

ptoxide supports most fundamental PTX features, such as:

Global, shared, and local (stack) memory
(Recursive) function calls
Thread synchronization using barriers
Various arithmetic operations on integers and floating point values
One-, two-, and three-dimensional thread grids and blocks

These features are sufficient to execute the kernels found in the kernels directory, such as simple vector operations, matrix multiplication, and matrix transposition using a shared buffer.

However, many features and instructions are still missing, and you will probably encounter todo!s and parsing errors when attempting to execute more complex programs. Pull requests to implement missing features are always greatly appreciated!

Internals

The code of the library itself is not yet well-documented. However, here is a general overview of the main modules comprising ptoxide:

The ast module implements the logic for parsing PTX programs.
The vm module defines a bytecode format and implements the virtual machine to execute it.
The compiler module implements a simple single-pass compiler to translate a PTX program given as an AST to bytecode.

Example

The following code snippet shows how to invoke a kernel to scale a vector of floats by a factor of 2. Check out the full example in the examples directory, or run it by running cargo run --example times_two.

use ptoxide::{Context, Argument, LaunchParams};

fn times_two(kernel: &str) {
    let a: Vec<f32> = vec![1., 2., 3., 4., 5.];
    let mut b: Vec<f32> = vec![0.; a.len()];

    let n = a.len();

    let mut ctx = Context::new_with_module(kernel).expect("compile kernel");

    const BLOCK_SIZE: u32 = 256;
    let grid_size = (n as u32 + BLOCK_SIZE - 1) / BLOCK_SIZE;

    let da = ctx.alloc(n);
    let db = ctx.alloc(n);

    ctx.write(da, &a);
    ctx.run(
        LaunchParams::func_id(0)
            .grid1d(grid_size)
            .block1d(BLOCK_SIZE),
        &[
            Argument::ptr(da),
            Argument::ptr(db),
            Argument::U64(n as u64),
        ],
    ).expect("execute kernel");

    ctx.read(db, &mut b);
    // prints [2.0, 4.0, 6.0, 8.0, 10.0]
    println!("{:?}", b);
}

Reading PTX

To learn more about the PTX ISA, check out NVIDIA's documentation.

License

ptoxide is dual-licensed under the Apache License version 2.0 and the MIT license, at your choosing.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
examples		examples
kernels		kernels
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

ptoxide

Supported Features

Internals

Example

Reading PTX

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

gvilums/ptoxide

Folders and files

Latest commit

History

Repository files navigation

ptoxide

Supported Features

Internals

Example

Reading PTX

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages