Docs (#268)

Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Benjamin Zaitlen (https://github.com/quasiben) - Vukasin Milovanovic (https://github.com/vuule) URL: #268
rapidsai · Sep 12, 2023 · 06d9ec9 · 06d9ec9
1 parent c85abd5
commit 06d9ec9
Show file tree

Hide file tree

Showing 14 changed files with 455 additions and 207 deletions.
diff --git a/README.md b/README.md
@@ -1,181 +1,23 @@
-# KvikIO: C++ and Python bindings to cuFile
+# KvikIO: High Performance File IO
 
 ## Summary
 
-This provides C++ and Python bindings to cuFile, which enables GPUDirect Storage (GDS).
-KvikIO also works efficiently when GDS isn't available and can read/write both host and
-device data seamlessly.
+KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python
+bindings to [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html),
+which enables [GPUDirect Storage (GDS)](https://developer.nvidia.com/blog/gpudirect-storage/).
+KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly.
+The C++ library is header-only making it easy to include in [existing projects](https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/downstream/).
+
 
 ### Features
 
-* Object Oriented API.
-* Exception handling.
+* Object oriented API of [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html) with C++/Python exception handling.
+* A Python [Zarr](https://zarr.readthedocs.io/en/stable/) backend for reading and writing GPU data to file seamlessly.
 * Concurrent reads and writes using an internal thread pool.
 * Non-blocking API.
-* Python Zarr reader.
 * Handle both host and device IO seamlessly.
 * Provides Python bindings to [nvCOMP](https://github.com/NVIDIA/nvcomp).
 
-## Requirements
-
-To install users should have a working Linux machine with CUDA Toolkit
-installed (v11.4+) and a working compiler toolchain (C++17 and cmake).
-
-### C++
-
-The C++ bindings are header-only and depends on the CUDA Driver API.
-In order to build and run the example code, CMake and the CUDA Runtime
-API is required.
-
-### Python
-
-The Python package depends on the following packages:
-
-* cython
-* pip
-* setuptools
-* scikit-build
-
-For nvCOMP, benchmarks, examples, and tests:
-
-* pytest
-* numpy
-* cupy
-
-## Install
-
-### Conda
-
-Install the stable release from the `rapidsai` channel like:
-
-```
-conda create -n kvikio_env -c rapidsai -c conda-forge kvikio
-```
-
-Install the `kvikio` conda package from the `rapidsai-nightly` channel like:
-
-```
-conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio
-```
-
-If the nightly install doesn't work, set `channel_priority: flexible` in your `.condarc`.
-
-In order to setup a development environment run:
-```
-conda env create --name kvikio-dev --file conda/environments/all_cuda-118_arch-x86_64.yaml
-```
-
-### C++ (build from source)
-
-To build the C++ example run:
-
-```
-./build.sh libkvikio
-```
-
-Then run the example:
-
-```
-./examples/basic_io
-```
-
-### Python (build from source)
-
-To build and install the extension run:
-
-```
-./build.sh kvikio
-```
-
-One might have to define `CUDA_HOME` to the path to the CUDA installation.
-
-In order to test the installation, run the following:
-
-```
-pytest tests/
-```
-
-And to test performance, run the following:
-
-```
-python benchmarks/single-node-io.py
-```
-
-## Examples
-
-
-### Notebooks
- - [How to read and write GPU memory directly to/from Zarr files](notebooks/zarr.ipynb)
-
-
-### C++
-
-```c++
-#include <cstddef>
-#include <cuda_runtime.h>
-#include <kvikio/file_handle.hpp>
-using namespace std;
-
-int main()
-{
-  // Create two arrays `a` and `b`
-  constexpr std::size_t size = 100;
-  void *a = nullptr;
-  void *b = nullptr;
-  cudaMalloc(&a, size);
-  cudaMalloc(&b, size);
-
-  // Write `a` to file
-  kvikio::FileHandle fw("test-file", "w");
-  size_t written = fw.write(a, size);
-  fw.close();
-
-  // Read file into `b`
-  kvikio::FileHandle fr("test-file", "r");
-  size_t read = fr.read(b, size);
-  fr.close();
-
-  // Read file into `b` in parallel using 16 threads
-  kvikio::default_thread_pool::reset(16);
-  {
-    kvikio::FileHandle f("test-file", "r");
-    future<size_t> future = f.pread(b_dev, sizeof(a), 0);  // Non-blocking
-    size_t read = future.get(); // Blocking
-    // Notice, `f` closes automatically on destruction.
-  }
-}
-```
-
-### Python
-
-```python
-import cupy
-import kvikio
-
-a = cupy.arange(100)
-f = kvikio.CuFile("test-file", "w")
-# Write whole array to file
-f.write(a)
-f.close()
-
-b = cupy.empty_like(a)
-f = kvikio.CuFile("test-file", "r")
-# Read whole array from file
-f.read(b)
-assert all(a == b)
-
-# Use contexmanager
-c = cupy.empty_like(a)
-with kvikio.CuFile(path, "r") as f:
-    f.read(c)
-assert all(a == c)
-
-# Non-blocking read
-d = cupy.empty_like(a)
-with kvikio.CuFile(path, "r") as f:
-    future1 = f.pread(d[:50])
-    future2 = f.pread(d[50:], file_offset=d[:50].nbytes)
-    future1.get()  # Wait for first read
-    future2.get()  # Wait for second read
-assert all(a == d)
-```
+### Documentation
+ * Python: <https://docs.rapids.ai/api/kvikio/nightly/>
+ * C++: <https://docs.rapids.ai/api/libkvikio/nightly/>
diff --git a/conda/environments/all_cuda-118_arch-x86_64.yaml b/conda/environments/all_cuda-118_arch-x86_64.yaml
@@ -23,16 +23,18 @@ dependencies:
 - libcufile=1.4.0.31
 - ninja
 - numpy>=1.21
+- numpydoc
 - nvcc_linux-64=11.8
 - nvcomp==2.6.1
 - packaging
 - pre-commit
-- pydata-sphinx-theme
 - pytest
 - pytest-cov
 - python>=3.9,<3.11
 - scikit-build>=0.13.1
 - sphinx
+- sphinx-click
+- sphinx_rtd_theme
 - sysroot_linux-64=2.17
 - zarr
 name: all_cuda-118_arch-x86_64
diff --git a/conda/environments/all_cuda-120_arch-x86_64.yaml b/conda/environments/all_cuda-120_arch-x86_64.yaml
@@ -23,15 +23,17 @@ dependencies:
 - libcufile-dev
 - ninja
 - numpy>=1.21
+- numpydoc
 - nvcomp==2.6.1
 - packaging
 - pre-commit
-- pydata-sphinx-theme
 - pytest
 - pytest-cov
 - python>=3.9,<3.11
 - scikit-build>=0.13.1
 - sphinx
+- sphinx-click
+- sphinx_rtd_theme
 - sysroot_linux-64=2.17
 - zarr
 name: all_cuda-120_arch-x86_64
diff --git a/cpp/doxygen/main_page.md b/cpp/doxygen/main_page.md
@@ -1,4 +1,135 @@
-# libkvikio
+# Welcome to KvikIO's C++ documentation!
 
-libkvikio is a C++ header-only library providing bindings to
-cuFile, which enables GPUDirectStorage (GDS).
+KvikIO is a Python and C++ library for high performance file IO. It provides C++ and Python
+bindings to [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html)
+which enables [GPUDirect Storage (GDS)](https://developer.nvidia.com/blog/gpudirect-storage/).
+KvikIO also works efficiently when GDS isn't available and can read/write both host and device data seamlessly.
+
+KvikIO C++ is a header-only library that is part of the [RAPIDS](https://rapids.ai/) suite of open-source software libraries for GPU-accelerated data science.
+
+---
+**Notice** this is the documentation for the C++ library. For the Python documentation of KvikIO, see under **KvikIO**.
+
+---
+
+## Features
+
+* Object Oriented API.
+* Exception handling.
+* Concurrent reads and writes using an internal thread pool.
+* Non-blocking API.
+* Handle both host and device IO seamlessly.
+
+## Installation
+
+KvikIO is a header-only library and as such doesn't need installation.
+However, for convenience we release Conda packages that makes it easy
+to include KvikIO in your CMake projects.
+
+### Conda/Mamba
+
+We strongly recommend using `mamba <https://github.com/mamba-org/mamba>`_ inplace of conda, which we will do throughout the documentation.
+
+Install the **stable release** from the ``rapidsai`` channel with the following:
+```sh
+# Install in existing environment
+mamba install -c rapidsai -c conda-forge libkvikio
+# Create new environment (CUDA 11.8)
+mamba create -n libkvikio-env -c rapidsai -c conda-forge cuda-version=11.8 libkvikio
+# Create new environment (CUDA 12.0)
+mamba create -n libkvikio-env -c rapidsai -c conda-forge cuda-version=12.0 libkvikio
+```
+
+Install the **nightly release** from the ``rapidsai-nightly`` channel with the following:
+
+```sh
+# Install in existing environment
+mamba install -c rapidsai-nightly -c conda-forge libkvikio
+# Create new environment (CUDA 11.8)
+mamba create -n libkvikio-env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 libkvikio
+# Create new environment (CUDA 12.0)
+mamba create -n libkvikio-env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=12.0 libkvikio
+```
+
+---
+**Notice** if the nightly install doesn't work, set ``channel_priority: flexible`` in your ``.condarc``.
+
+---
+
+### Include KvikIO in a CMake project
+An example of how to include KvikIO in an existing CMake project can be found here:  <https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/downstream/>.
+
+
+### Build from source
+
+To build the C++ example run:
+
+```
+./build.sh libkvikio
+```
+
+Then run the example:
+
+```
+./examples/basic_io
+```
+
+## Runtime Settings
+
+#### Compatibility Mode (KVIKIO_COMPAT_MODE)
+When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. That is cuFile can run in compatibility mode while KvikIO is not.
+
+Set the environment variable `KVIKIO_COMPAT_MODE` to enable/disable compatibility mode. By default, compatibility mode is enabled:
+  - when `libcufile.so` cannot be found.
+  - when running in Windows Subsystem for Linux (WSL).
+  - when `/run/udev` isn't readable, which typically happens when running inside a docker image not launched with `--volume /run/udev:/run/udev:ro`.
+
+#### Thread Pool (KVIKIO_NTHREADS)
+KvikIO can use multiple threads for IO automatically. Set the environment variable `KVIKIO_NTHREADS` to the number of threads in the thread pool. If not set, the default value is 1.
+
+#### Task Size (KVIKIO_TASK_SIZE)
+KvikIO splits parallel IO operations into multiple tasks. Set the environment variable `KVIKIO_TASK_SIZE` to the maximum task size (in bytes). If not set, the default value is 4194304 (4 MiB).
+
+#### GDS Threshold (KVIKIO_GDS_THRESHOLD)
+In order to improve performance of small IO, `.pread()` and `.pwrite()` implement a shortcut that circumvent the threadpool and use the POSIX backend directly. Set the environment variable `KVIKIO_GDS_THRESHOLD` to the minimum size (in bytes) to use GDS. If not set, the default value is 1048576 (1 MiB).
+
+
+## Example
+
+```cpp
+#include <cstddef>
+#include <cuda_runtime.h>
+#include <kvikio/file_handle.hpp>
+using namespace std;
+
+int main()
+{
+  // Create two arrays `a` and `b`
+  constexpr std::size_t size = 100;
+  void *a = nullptr;
+  void *b = nullptr;
+  cudaMalloc(&a, size);
+  cudaMalloc(&b, size);
+
+  // Write `a` to file
+  kvikio::FileHandle fw("test-file", "w");
+  size_t written = fw.write(a, size);
+  fw.close();
+
+  // Read file into `b`
+  kvikio::FileHandle fr("test-file", "r");
+  size_t read = fr.read(b, size);
+  fr.close();
+
+  // Read file into `b` in parallel using 16 threads
+  kvikio::default_thread_pool::reset(16);
+  {
+    kvikio::FileHandle f("test-file", "r");
+    future<size_t> future = f.pread(b_dev, sizeof(a), 0);  // Non-blocking
+    size_t read = future.get(); // Blocking
+    // Notice, `f` closes automatically on destruction.
+  }
+}
+```
+
+For a full runnable example see <https://github.com/rapidsai/kvikio/blob/HEAD/cpp/examples/basic_io.cpp>.
diff --git a/cpp/include/kvikio/defaults.hpp b/cpp/include/kvikio/defaults.hpp
@@ -218,8 +218,8 @@ class defaults {
    * In order to improve performance of small IO, `.pread()` and `.pwrite()` implement a shortcut
    * that circumvent the threadpool and use the POSIX backend directly.
    *
-   * Set the default value using `kvikio::default::task_size_reset()` or by setting the
-   * `KVIKIO_TASK_SIZE` environment variable. If not set, the default value is 1 MiB.
+   * Set the default value using `kvikio::default::gds_threshold_reset()` or by setting the
+   * `KVIKIO_GDS_THRESHOLD` environment variable. If not set, the default value is 1 MiB.
    *
    * @return The default GDS threshold size in bytes.
    */

diff --git a/dependencies.yaml b/dependencies.yaml
@@ -236,8 +236,10 @@ dependencies:
     common:
       - output_types: [conda, requirements]
         packages:
-          - pydata-sphinx-theme
+          - numpydoc
           - sphinx
+          - sphinx-click
+          - sphinx_rtd_theme
       - output_types: conda
         packages:
           - doxygen=1.9.1 # pre-commit hook needs a specific version.