Implement support in Java for shared memory buffers #7

ctrueden · 2023-06-22T20:59:01Z

I started implementing a SharedMemory class in Appose Java, which is a direct translation of Python's handy multiprocessing.shared_memory.SharedMemory class. It leans on JNA for the low-level system calls. But it is not yet fully working. This needs to be finished, so that we can share memory easily between processes!

I'm hoping the JNA-based approach will work directly. But other options include:

Add support for shared memory buffers to the LArray library—work already started by @mkitti on a branch
Write our own small C library and ship it in the Appose Java JAR

The text was updated successfully, but these errors were encountered:

carlosuc3m · 2023-11-06T15:25:11Z

HEllo @ctrueden ,
I just finished implementing shared memory IPC communication on the JDLL side:
https://github.com/bioimage-io/JDLL/blob/python-appose/src/main/java/io/bioimage/modelrunner/tensor/SharedMemoryArray.java
I have used jna-platform which contains a ready to use JNA for kernel32 and posix methods (what cpython uses for multiprocessing.shared_memory).
I had the urgency to implement it because when I tried to send relatively big images through the pipes it would take too much time. Changing to shm works great and has a huge performance improvement as the information sent by the pipes is way smaller now.
I did not change anything on the appose implementatio, I just carefully desigined the inputs that I sent to appose from JDLL and the outputs of the Python worker.

JDLL creates a script that is able to recreate the imglib2 array of the Java side:

# Create the shared memory object pointint to the shm block of interest with the wanted size
input_06112023_151243_shm_a9b8d59c_0087_4394_a9dd_fb54e23235e6 = shared_memory.SharedMemory(name='b024e140-5780-4ec6-a45d-f51cfa2ce3e7', size=20545536)

# Recreate the data array with the data from the shm block
input_06112023_151243 = xr.DataArray(np.ndarray(5136384, dtype='float32', buffer=input_06112023_151243_shm_a9b8d59c_0087_4394_a9dd_fb54e23235e6.buf).reshape([1, 304, 512, 33]), dims=["b", "y", "x", "c"], name="output")

All the code above is then fed to the Python worker to be executed, as you can see the memory location, size and shape is hardcoded on the JAva side (done here)

Then once the result is obtained, on the Python side, the shm block is created and the info to access that memory block is sent to java together with the data type, shape and whether is fortran order or not.

On the Python side:

shm_out_list = []
def convertNpIntoDic(np_arr):
  shm = shared_memory.SharedMemory(create=True, size=np_arr.nbytes)
  aux_np_arr = np.ndarray((np_arr.size), dtype=np_arr.dtype, buffer=shm.buf)
  aux_np_arr[:] = np_arr.flatten()
  shm_out_list.append(shm)
  shm.unlink()
  return {"data": shm.name, "shape": np_arr.shape, "appose_data_type__06112023_151246": "np_arr", "is_fortran": np.isfortran(np_arr), "dtype": str(np_arr.dtype)}

def convertXrIntoDic(xr_arr):
  shm = shared_memory.SharedMemory(create=True, size=xr_arr.values.nbytes)
  aux_np_arr = np.ndarray((xr_arr.values.size), dtype=xr_arr.values.dtype, buffer=shm.buf)
  aux_np_arr[:] = xr_arr.values.flatten()
  shm_out_list.append(shm)
  shm.unlink()
  return {"data": shm.name, "shape": xr_arr.shape, "axes": "".join(xr_arr.dims),"name": xr_arr.name, "appose_data_type__06112023_151246": "tensor", "is_fortran": np.isfortran(xr_arr.values), "dtype": str(xr_arr.values.dtype)}

The dictionary returned is then used by the following Java method to recreate a JDLL tensor or ImgLib array:
https://github.com/bioimage-io/JDLL/blob/cdf5469f0405be6aabc30d7380ba0f4c1baf2a11/src/main/java/io/bioimage/modelrunner/tensor/SharedMemoryArray.java#L422

Performance is improved greatly (at least on my old laptop) because the stdout does not need to flush millions of numbers and the result looks like the following:
https://twitter.com/carlosg91018370/status/1721526937686868443

A similar logic dould be implemented in Appose to support sending more high level objects such as numpy arrays or imglib2 images.

Regards,
Carlos

ctrueden · 2023-11-06T19:46:55Z

@carlosuc3m This is exciting news! I am especially happy to hear that this improved performance for you.

How straightforward do you think it will be to migrate the logic from your SharedMemoryArray upstream to appose-java into its SharedMemory? Is this the direction you think we should go?

carlosuc3m · 2023-11-20T13:54:34Z

I do not think it would be difficult to implement the logic on the Python side, but we would need to decide how to approach it.
First we would need to decide for which objects we support "conversion", is it numpy array to ImgLib2 RandomAccessible interval? Something else? And then we have to decide how to send them from one worker to another.

Currently, the dictionary of variables is encoded, flushed, read by the other process and when it is decoded we directly have the original variables. To send Numpy/RandomAccessiblInterval arrays we need to send the memory location (String), the datatype (String), the shape (List of longs) and whether the array is it fortran order or not (boolean).

In the case of JAva worker sending RAI to Python the Numpy array decoding can be done in two ways in my opinion.

On the Java side, let the user provide a RAI directly as the input:

Map<String, Object> inputsMap = new HashMap<String, Object>();
RandomAccessibleInterval<FloatType> rai = ArrayImgs.floats(new long[] {512, 512});
inputsMap.put("input_rai", rai);
Task task = python.task(code, inputsMap);

And then in the encoding stage, still on the Java side:

for (Entry entry : inputsMap.entrySet()) {
  if (entry.getValue() instanceof RandomAccessibleInterval) {
    // Method that converts RAI into a dictionary containing all the info needed to create a np array on the python side
    // It should also contain String indentifier to know that it is intended to be converted into numpy array
    // and not directly passed as a dictionary to the Python script
    inputsMap.put(entry.getKey(), SharedMemory.build(entry.getValue()));
  }
}

Finally on the Python side after decoding, iterate over the varaibles and if any of the variables contains the specific identifier, convert it to Np array.

Similar approach to send outputs from Python to Java

Another way to do it, which is how I do it in JDLL, is to add at the begining of the Python script that it is run with exec the code to recreate a numpy array. The implementation is here: https://github.com/bioimage-io/JDLL/blob/da0117b1c13acff43cadb6cbe8ff3b86b3014b45/src/main/java/io/bioimage/modelrunner/runmode/RunMode.java#L370

In both cases, the shared memory segments created by any of the processes should be treated carefully to ensure that they are unlinked once they are not used anymore.

ctrueden · 2024-08-10T18:33:31Z

This issue has been addressed by apposed/appose-java#5. Big thanks to @tpietzsch and @carlosuc3m for their efforts. 🍻

ctrueden · 2024-08-10T22:00:14Z

And for the record, the Python implementation maintains feature parity with Java thanks to apposed/appose-python#1.

ctrueden added this to the 0.3.0 milestone Jun 22, 2023

ctrueden added the language:java label Jun 22, 2023

ctrueden closed this as completed Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support in Java for shared memory buffers #7

Implement support in Java for shared memory buffers #7

ctrueden commented Jun 22, 2023 •

edited

Loading

carlosuc3m commented Nov 6, 2023 •

edited

Loading

ctrueden commented Nov 6, 2023

carlosuc3m commented Nov 20, 2023

ctrueden commented Aug 10, 2024

ctrueden commented Aug 10, 2024

Implement support in Java for shared memory buffers #7

Implement support in Java for shared memory buffers #7

Comments

ctrueden commented Jun 22, 2023 • edited Loading

carlosuc3m commented Nov 6, 2023 • edited Loading

ctrueden commented Nov 6, 2023

carlosuc3m commented Nov 20, 2023

ctrueden commented Aug 10, 2024

ctrueden commented Aug 10, 2024

ctrueden commented Jun 22, 2023 •

edited

Loading

carlosuc3m commented Nov 6, 2023 •

edited

Loading