Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Initial Java Support for GDS to KvikIO #396

Open
wants to merge 25 commits into
base: branch-24.12
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
83044ff
Initial commit
aslobodaNV Jun 25, 2024
62649e6
Update documentation to better flash out how to compile and run the e…
aslobodaNV Jun 25, 2024
161b260
code touchups and Readme update
aslobodaNV Jun 25, 2024
30aa3dc
Update README with markdown formatting. Improve instructions and linkage
aslobodaNV Jun 26, 2024
299bdb6
Add initial maven setup, needs to be debugged
aslobodaNV Jul 12, 2024
1162efc
Update maven build to properly generate shared library
aslobodaNV Jul 12, 2024
3f29546
Merge branch 'branch-24.10' into add_initial_java_support
jakirkham Jul 24, 2024
b3dd38f
Merge branch 'rapidsai:branch-24.10' into add_initial_java_support
aslobodaNV Sep 4, 2024
fe91427
Fix pre-commit issues
aslobodaNV Sep 4, 2024
4ad08c6
move example to be a test, update pom and dependencies to support CI …
aslobodaNV Sep 4, 2024
83875b6
pre-commit fixes
aslobodaNV Sep 4, 2024
4120e4c
add github workflow items
aslobodaNV Sep 4, 2024
888d6a7
Updating workflows
aslobodaNV Sep 18, 2024
d337a16
Merge branch 'branch-24.10' into add_initial_java_support
aslobodaNV Sep 18, 2024
833ad32
Formatting inconsistencies
aslobodaNV Sep 18, 2024
93598a8
Merge remote-tracking branch 'upstream/branch-24.12' into add_initial…
aslobodaNV Oct 1, 2024
661fcce
Update yaml files based on CR feedback, update versions to 24.12 from…
aslobodaNV Oct 1, 2024
7610edb
Fix needs and permissions.
bdice Oct 1, 2024
3eb3daa
Remove test_python_legate.
bdice Oct 1, 2024
13b9a05
Merge branch 'branch-24.12' into add_initial_java_support
bdice Oct 1, 2024
e3f8448
Merge branch 'add_initial_java_support' of github.com:aslobodaNV/kvik…
bdice Oct 1, 2024
fa749a3
Update java/pom.xml
aslobodaNV Oct 2, 2024
222426c
Update package for java bindings.
aslobodaNV Oct 11, 2024
c704108
Use cmake instead of explicit nvcc command
aslobodaNV Oct 11, 2024
69b6d39
Merge remote-tracking branch 'upstream/branch-24.12' into add_initial…
aslobodaNV Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ jobs:
- checks
- conda-cpp-build
- conda-cpp-tests
- conda-java-tests
- conda-python-build
- conda-python-tests
- docs-build
Expand All @@ -39,6 +40,16 @@ jobs:
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
conda-java-tests:
needs: conda-cpp-build
secrets: inherit
bdice marked this conversation as resolved.
Show resolved Hide resolved
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
node_type: "gpu-v100-latest-1"
arch: "amd64"
container_image: "rapidsai/ci-conda:latest"
run_script: "ci/test_java.sh"
conda-python-build:
needs: conda-cpp-build
secrets: inherit
Expand Down
12 changes: 12 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,15 @@ jobs:
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
conda-java-tests:
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
node_type: "gpu-v100-latest-1"
arch: "amd64"
container_image: "rapidsai/ci-conda:latest"
run_script: "ci/test_java.sh"
43 changes: 43 additions & 0 deletions ci/test_java.sh
bdice marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/bin/bash
# Copyright (c) 2024, NVIDIA CORPORATION.

set -euo pipefail

. /opt/conda/etc/profile.d/conda.sh

rapids-logger "Generate java testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_java \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch)" | tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test

# Temporarily allow unbound variables for conda activation.
set +u
conda activate test
set -u

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-print-env

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
libkvikio libkvikio-tests

rapids-logger "Check GPU usage"
nvidia-smi

EXITCODE=0
trap "EXITCODE=1" ERR
set +e

rapids-logger "Run Java tests"
pushd java
mvn test -B
popd

rapids-logger "Test script exiting with value: $EXITCODE"
exit ${EXITCODE}
14 changes: 14 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ files:
key: test
includes:
- test_python
test_java:
output: none
includes:
- build-universal
- build-cpp
- cuda_version
- cuda
- test_java
channels:
- rapidsai
- rapidsai-nightly
Expand Down Expand Up @@ -353,3 +361,9 @@ dependencies:
- matrix: # All CUDA 11 versions
packages:
- cuda-python>=11.7.1,<12.0a0
test_java:
common:
- output_types: conda
packages:
- maven
- openjdk=8.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use openjdk=8.*? It appears that the latest is openjdk=22.*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I just happened to have 8 installed and used that. I will update to 22

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDK8 is a very much a least common denominator. It is no longer supported without paying a lot of money to oracle though.

https://www.oracle.com/java/technologies/java-se-support-roadmap.html

and oracle is trying very hard to push people away from older versions of java.

java 11, 17, and 21 are long term support versions that are currently GA.

Java generally has really good backwards compatibility so almost anything compiled with JDK8 can run on anything newer. But newer code is not backwards compatible with older runtimes. But you can ask newer compilers to output binary code that is compatible with older JDK versions. That is what we do with the Spark plugin.

We also support whatever Spark supports for their default binary release. So for older versions of Spark it is JDK8. For some of the latest versions that is JDK17, but because of how we ask the compiler to output older binary versions it really becomes a requirement that they have at least that version of java installed.

I wouldn't update to jdk22 unless you also decide which version of java you want to be minimally compatible with and ask the compiler to output compatible .class files.

Also just another point to think about jcuda still uses openjdk8. This is probably to have as much compatibility as possible. https://github.com/jcuda/jcuda-main/blob/master/BUILDING.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that context, that makes a lot of sense. I have not been following the broader Java versioning ecosystem so was unaware of much of this. I think it is likely we will want to pick one of the versions with long term support, I would guess 17 or 21, but will think on it some more before making an update.

74 changes: 74 additions & 0 deletions java/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Java KvikIO Bindings

## Summary
These Java KvikIO bindings for GDS currently support only synchronous read and write IO operations using the underlying CuFile API. Support for batch IO and asynchronous operations are not yet supported.

## Dependencies
The Java KvikIO bindings have been developed to work on Linux based systems and require [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) to be installed and for [GDS](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html) to be properly enabled. To compile the shared library it is also necessary to have a JDK installed. To run the included example, it is also necessary to install JCuda as it is used to handle memory allocations and the transfer of data between host and GPU memory. JCuda jar files supporting CUDA 12.x can be found here:
[jcuda-12.0.0.jar](https://repo1.maven.org/maven2/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar),
[jcuda-natives-12.0.0.jar](https://repo1.maven.org/maven2/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar)

For more information on JCuda and potentially more up to date installation instructions or jar files, see here:
[JCuda](http://javagl.de/jcuda.org/), [JCuda Usage](https://github.com/jcuda/jcuda-main/blob/master/USAGE.md), [JCuda Maven Repo](https://mvnrepository.com/artifact/org.jcuda)

## Compilation
To recompile the .so file for your local system run the following command. Note: Update the command to reflect the directory where you have installed CUDA and your JDK.

/usr/local/cuda/bin/nvcc -shared -o libCuFileJNI.so -I/usr/local/cuda/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/linux src/main/native/src/CuFileJni.cpp --compiler-options "-fPIC" -lcufile

The resulting .so file must be in your JVM library path when running upstream Java programs. If it is not already placed on your path in can be included by including an argument like the following:

-Djava.library.path={path/to/your/so/file/}

## Examples
An example for how to use the Java KvikIO bindings can be found in src/main/java/bindings/kvikio/example . Note: This example has a dependency on JCuda so ensure that when running the example the JCuda shared library files are on the JVM library path along with the libCuFileJNI.so file.

### Specific instructions to run the example using Maven

#### Compile the shared library and Java files with Maven

cd kvikio/java/
mvn clean install

#### Setup a test file target NOTE: your mount directory may differ from /mnt/nvme, so update this command appropriately as well as example/Main.java to point to the correct file path.

touch /mnt/nvme/java_test

#### Run example

cd kvikio/java/
java -cp target/cufile-24.12.0-SNAPSHOT.jar:$HOME/.m2/repository/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar:$HOME/.m2/repository/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar -Djava.library.path=./target bindings.kvikio.example.Main

### Specific instructions to run the example from a terminal

#### Compile class files

cd kvikio/java/src/main/java/bindings/kvikio/cufile
javac *.java

#### Retrieve Jcuda jar files

cd kvikio/java/
mkdir lib
cd lib
wget https://repo1.maven.org/maven2/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar
wget https://repo1.maven.org/maven2/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar

#### Compile shared library

cd kvikio/java/lib
/usr/local/cuda/bin/nvcc -shared -o libCuFileJNI.so -I/usr/local/cuda/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/linux ../src/main/native/src/CuFileJni.cpp --compiler-options "-fPIC" -lcufile

#### Setup a test file target NOTE: your mount directory may differ from /mnt/nvme, so update this command appropriately as well as example/Main.java to point to the correct file path.

touch /mnt/nvme/java_test

#### Compile example file

cd kvikio/java/src/main/java
javac -cp .:../../../lib/jcuda-12.0.0.jar:../../../lib/jcuda-natives-12.0.0.jar bindings/kvikio/example/Main.java

#### Run example

cd kvikio/java/src/main/java
java -cp .:../../../lib/jcuda-12.0.0.jar:../../../lib/jcuda-natives-12.0.0.jar -Djava.library.path=../../../lib/ bindings.kvikio.example.main
147 changes: 147 additions & 0 deletions java/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>bindings.kvikio</groupId>
<artifactId>cufile</artifactId>
<version>24.12.0-SNAPSHOT</version>

<name>cufile</name>
<description>
This project provides java bindings for the GPUDirect Storage cufile library, enabling the GPU to load and
save large amounts of data to and from persistent storage. This is still a work in progress so some APIs may change.
</description>
<url>http://ai.rapids</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<junit.version>5.4.2</junit.version>
</properties>

<dependencies>
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcuda</artifactId>
<version>12.0.0</version>
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved
</dependency>
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcuda-natives</artifactId>
<version>12.0.0</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-exec-plugin</artifactId>
<version>1.6.0</version>
</plugin>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<createDirs>true</createDirs>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>21</source>
<target>21</target>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where you can set what the output target is that you want.

</configuration>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
<configuration>
<argLine>-Djava.library.path=${project.build.directory}:${java.library.path}</argLine>
</configuration>
<dependencies>
<dependency>
<groupId>org.junit.platform</groupId>
<artifactId>junit-platform-surefire-provider</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.4.2</version>
</dependency>
</dependencies>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<id>compile-native-code</id>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<!-- Compile native code using nvcc -->
<exec executable="nvcc">
<arg value="-shared"/>
<arg value="-o"/>
<arg value="${project.build.directory}/libCuFileJNI.so"/>
<arg value="-I/usr/local/cuda/include/"/>
<arg value="-I/usr/lib/jvm/java-21-openjdk-amd64/include/"/>
<arg value="-I/usr/lib/jvm/java-21-openjdk-amd64/include/linux"/>
<arg value="${project.basedir}/src/main/native/src/CuFileJni.cpp"/>
<arg value="--compiler-options"/>
<arg value="-fPIC"/>
<arg value="-lcufile"/>
</exec>
</target>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
45 changes: 45 additions & 0 deletions java/src/main/java/bindings/kvikio/cufile/CuFile.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package bindings.kvikio.cufile;
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved

public class CuFile {
private static boolean initialized = false;
private static CuFileDriver driver;

static {
initialize();
}

static synchronized void initialize() {
if (!initialized) {
try {
System.loadLibrary("CuFileJNI");
driver = new CuFileDriver();
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
driver.close();
}));
initialized = true;
} catch (Throwable t) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: catching Throwable is not ideal. It can include all kinds of unrecoverable errors. At a minimum you probably want to print out why this could not be initialized in the error message. That way you can debug that a dependency was missing/etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to point me to an example of best practices in this area? Happy to change it further from my latest update.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked it as a nit because it is probably fine in this case. In fact when I went back to look at the CUDF code, I did the exact same thing 5 years ago.

https://github.com/rapidsai/cudf/blob/4dbb8a354a9d4f0b4d82a5bf9747409c6304358f/java/src/main/java/ai/rapids/cudf/NativeDepsLoader.java#L74-L83

It is just generally not good practice to catch a Throwable, because Throwable includes all Errors and Errors are generally considered to not be recoverable. But really the main thing here is giving the user enough information that they can debug why it is failing. You are printing the error message, but the full stack trace would be better. That is the main thing that I would suggest you do.

System.out.println("could not load cufile jni library");
}
}
}

public static boolean libraryLoaded() {
return initialized;
}
}
Loading