Skip to content

Commit

Permalink
Docs (#268)
Browse files Browse the repository at this point in the history
Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Benjamin Zaitlen (https://github.com/quasiben)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #268
  • Loading branch information
madsbk authored Sep 12, 2023
1 parent c85abd5 commit 06d9ec9
Show file tree
Hide file tree
Showing 14 changed files with 455 additions and 207 deletions.
182 changes: 12 additions & 170 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,181 +1,23 @@
# KvikIO: C++ and Python bindings to cuFile
# KvikIO: High Performance File IO

## Summary

This provides C++ and Python bindings to cuFile, which enables GPUDirect Storage (GDS).
KvikIO also works efficiently when GDS isn't available and can read/write both host and
device data seamlessly.
KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python
bindings to [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html),
which enables [GPUDirect Storage (GDS)](https://developer.nvidia.com/blog/gpudirect-storage/).
KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly.
The C++ library is header-only making it easy to include in [existing projects](https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/downstream/).


### Features

* Object Oriented API.
* Exception handling.
* Object oriented API of [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html) with C++/Python exception handling.
* A Python [Zarr](https://zarr.readthedocs.io/en/stable/) backend for reading and writing GPU data to file seamlessly.
* Concurrent reads and writes using an internal thread pool.
* Non-blocking API.
* Python Zarr reader.
* Handle both host and device IO seamlessly.
* Provides Python bindings to [nvCOMP](https://github.com/NVIDIA/nvcomp).

## Requirements

To install users should have a working Linux machine with CUDA Toolkit
installed (v11.4+) and a working compiler toolchain (C++17 and cmake).

### C++

The C++ bindings are header-only and depends on the CUDA Driver API.
In order to build and run the example code, CMake and the CUDA Runtime
API is required.

### Python

The Python package depends on the following packages:

* cython
* pip
* setuptools
* scikit-build

For nvCOMP, benchmarks, examples, and tests:

* pytest
* numpy
* cupy

## Install

### Conda

Install the stable release from the `rapidsai` channel like:

```
conda create -n kvikio_env -c rapidsai -c conda-forge kvikio
```

Install the `kvikio` conda package from the `rapidsai-nightly` channel like:

```
conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio
```

If the nightly install doesn't work, set `channel_priority: flexible` in your `.condarc`.

In order to setup a development environment run:
```
conda env create --name kvikio-dev --file conda/environments/all_cuda-118_arch-x86_64.yaml
```

### C++ (build from source)

To build the C++ example run:

```
./build.sh libkvikio
```

Then run the example:

```
./examples/basic_io
```

### Python (build from source)

To build and install the extension run:

```
./build.sh kvikio
```

One might have to define `CUDA_HOME` to the path to the CUDA installation.

In order to test the installation, run the following:

```
pytest tests/
```

And to test performance, run the following:

```
python benchmarks/single-node-io.py
```

## Examples


### Notebooks
- [How to read and write GPU memory directly to/from Zarr files](notebooks/zarr.ipynb)


### C++

```c++
#include <cstddef>
#include <cuda_runtime.h>
#include <kvikio/file_handle.hpp>
using namespace std;

int main()
{
// Create two arrays `a` and `b`
constexpr std::size_t size = 100;
void *a = nullptr;
void *b = nullptr;
cudaMalloc(&a, size);
cudaMalloc(&b, size);

// Write `a` to file
kvikio::FileHandle fw("test-file", "w");
size_t written = fw.write(a, size);
fw.close();

// Read file into `b`
kvikio::FileHandle fr("test-file", "r");
size_t read = fr.read(b, size);
fr.close();

// Read file into `b` in parallel using 16 threads
kvikio::default_thread_pool::reset(16);
{
kvikio::FileHandle f("test-file", "r");
future<size_t> future = f.pread(b_dev, sizeof(a), 0); // Non-blocking
size_t read = future.get(); // Blocking
// Notice, `f` closes automatically on destruction.
}
}
```

### Python

```python
import cupy
import kvikio

a = cupy.arange(100)
f = kvikio.CuFile("test-file", "w")
# Write whole array to file
f.write(a)
f.close()

b = cupy.empty_like(a)
f = kvikio.CuFile("test-file", "r")
# Read whole array from file
f.read(b)
assert all(a == b)

# Use contexmanager
c = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
f.read(c)
assert all(a == c)

# Non-blocking read
d = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
future1 = f.pread(d[:50])
future2 = f.pread(d[50:], file_offset=d[:50].nbytes)
future1.get() # Wait for first read
future2.get() # Wait for second read
assert all(a == d)
```
### Documentation
* Python: <https://docs.rapids.ai/api/kvikio/nightly/>
* C++: <https://docs.rapids.ai/api/libkvikio/nightly/>
4 changes: 3 additions & 1 deletion conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,18 @@ dependencies:
- libcufile=1.4.0.31
- ninja
- numpy>=1.21
- numpydoc
- nvcc_linux-64=11.8
- nvcomp==2.6.1
- packaging
- pre-commit
- pydata-sphinx-theme
- pytest
- pytest-cov
- python>=3.9,<3.11
- scikit-build>=0.13.1
- sphinx
- sphinx-click
- sphinx_rtd_theme
- sysroot_linux-64=2.17
- zarr
name: all_cuda-118_arch-x86_64
4 changes: 3 additions & 1 deletion conda/environments/all_cuda-120_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,17 @@ dependencies:
- libcufile-dev
- ninja
- numpy>=1.21
- numpydoc
- nvcomp==2.6.1
- packaging
- pre-commit
- pydata-sphinx-theme
- pytest
- pytest-cov
- python>=3.9,<3.11
- scikit-build>=0.13.1
- sphinx
- sphinx-click
- sphinx_rtd_theme
- sysroot_linux-64=2.17
- zarr
name: all_cuda-120_arch-x86_64
137 changes: 134 additions & 3 deletions cpp/doxygen/main_page.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,135 @@
# libkvikio
# Welcome to KvikIO's C++ documentation!

libkvikio is a C++ header-only library providing bindings to
cuFile, which enables GPUDirectStorage (GDS).
KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python
bindings to [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html)
which enables [GPUDirect Storage (GDS)](https://developer.nvidia.com/blog/gpudirect-storage/).
KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly.

KvikIO C++ is a header-only library that is part of the [RAPIDS](https://rapids.ai/) suite of open-source software libraries for GPU-accelerated data science.

---
**Notice** this is the documentation for the C++ library. For the Python documentation of KvikIO, see under **KvikIO**.

---

## Features

* Object Oriented API.
* Exception handling.
* Concurrent reads and writes using an internal thread pool.
* Non-blocking API.
* Handle both host and device IO seamlessly.

## Installation

KvikIO is a header-only library and as such doesn't need installation.
However, for convenience we release Conda packages that makes it easy
to include KvikIO in your CMake projects.

### Conda/Mamba

We strongly recommend using `mamba <https://github.com/mamba-org/mamba>`_ inplace of conda, which we will do throughout the documentation.

Install the **stable release** from the ``rapidsai`` channel with the following:
```sh
# Install in existing environment
mamba install -c rapidsai -c conda-forge libkvikio
# Create new environment (CUDA 11.8)
mamba create -n libkvikio-env -c rapidsai -c conda-forge cuda-version=11.8 libkvikio
# Create new environment (CUDA 12.0)
mamba create -n libkvikio-env -c rapidsai -c conda-forge cuda-version=12.0 libkvikio
```

Install the **nightly release** from the ``rapidsai-nightly`` channel with the following:

```sh
# Install in existing environment
mamba install -c rapidsai-nightly -c conda-forge libkvikio
# Create new environment (CUDA 11.8)
mamba create -n libkvikio-env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 libkvikio
# Create new environment (CUDA 12.0)
mamba create -n libkvikio-env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=12.0 libkvikio
```

---
**Notice** if the nightly install doesn't work, set ``channel_priority: flexible`` in your ``.condarc``.

---

### Include KvikIO in a CMake project
An example of how to include KvikIO in an existing CMake project can be found here: <https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/downstream/>.


### Build from source

To build the C++ example run:

```
./build.sh libkvikio
```

Then run the example:

```
./examples/basic_io
```

## Runtime Settings

#### Compatibility Mode (KVIKIO_COMPAT_MODE)
When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. That is cuFile can run in compatibility mode while KvikIO is not.

Set the environment variable `KVIKIO_COMPAT_MODE` to enable/disable compatibility mode. By default, compatibility mode is enabled:
- when `libcufile.so` cannot be found.
- when running in Windows Subsystem for Linux (WSL).
- when `/run/udev` isn't readable, which typically happens when running inside a docker image not launched with `--volume /run/udev:/run/udev:ro`.

#### Thread Pool (KVIKIO_NTHREADS)
KvikIO can use multiple threads for IO automatically. Set the environment variable `KVIKIO_NTHREADS` to the number of threads in the thread pool. If not set, the default value is 1.

#### Task Size (KVIKIO_TASK_SIZE)
KvikIO splits parallel IO operations into multiple tasks. Set the environment variable `KVIKIO_TASK_SIZE` to the maximum task size (in bytes). If not set, the default value is 4194304 (4 MiB).

#### GDS Threshold (KVIKIO_GDS_THRESHOLD)
In order to improve performance of small IO, `.pread()` and `.pwrite()` implement a shortcut that circumvent the threadpool and use the POSIX backend directly. Set the environment variable `KVIKIO_GDS_THRESHOLD` to the minimum size (in bytes) to use GDS. If not set, the default value is 1048576 (1 MiB).


## Example

```cpp
#include <cstddef>
#include <cuda_runtime.h>
#include <kvikio/file_handle.hpp>
using namespace std;

int main()
{
// Create two arrays `a` and `b`
constexpr std::size_t size = 100;
void *a = nullptr;
void *b = nullptr;
cudaMalloc(&a, size);
cudaMalloc(&b, size);

// Write `a` to file
kvikio::FileHandle fw("test-file", "w");
size_t written = fw.write(a, size);
fw.close();

// Read file into `b`
kvikio::FileHandle fr("test-file", "r");
size_t read = fr.read(b, size);
fr.close();

// Read file into `b` in parallel using 16 threads
kvikio::default_thread_pool::reset(16);
{
kvikio::FileHandle f("test-file", "r");
future<size_t> future = f.pread(b_dev, sizeof(a), 0); // Non-blocking
size_t read = future.get(); // Blocking
// Notice, `f` closes automatically on destruction.
}
}
```

For a full runnable example see <https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/basic_io.cpp>.
4 changes: 2 additions & 2 deletions cpp/include/kvikio/defaults.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,8 @@ class defaults {
* In order to improve performance of small IO, `.pread()` and `.pwrite()` implement a shortcut
* that circumvent the threadpool and use the POSIX backend directly.
*
* Set the default value using `kvikio::default::task_size_reset()` or by setting the
* `KVIKIO_TASK_SIZE` environment variable. If not set, the default value is 1 MiB.
* Set the default value using `kvikio::default::gds_threshold_reset()` or by setting the
* `KVIKIO_GDS_THRESHOLD` environment variable. If not set, the default value is 1 MiB.
*
* @return The default GDS threshold size in bytes.
*/
Expand Down
4 changes: 3 additions & 1 deletion dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,10 @@ dependencies:
common:
- output_types: [conda, requirements]
packages:
- pydata-sphinx-theme
- numpydoc
- sphinx
- sphinx-click
- sphinx_rtd_theme
- output_types: conda
packages:
- doxygen=1.9.1 # pre-commit hook needs a specific version.
Expand Down
Loading

0 comments on commit 06d9ec9

Please sign in to comment.