Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support in Java for shared memory buffers #7

Closed
ctrueden opened this issue Jun 22, 2023 · 5 comments
Closed

Implement support in Java for shared memory buffers #7

ctrueden opened this issue Jun 22, 2023 · 5 comments
Milestone

Comments

@ctrueden
Copy link
Member

ctrueden commented Jun 22, 2023

I started implementing a SharedMemory class in Appose Java, which is a direct translation of Python's handy multiprocessing.shared_memory.SharedMemory class. It leans on JNA for the low-level system calls. But it is not yet fully working. This needs to be finished, so that we can share memory easily between processes!

I'm hoping the JNA-based approach will work directly. But other options include:

  • Add support for shared memory buffers to the LArray library—work already started by @mkitti on a branch
  • Write our own small C library and ship it in the Appose Java JAR
@ctrueden ctrueden added this to the 0.3.0 milestone Jun 22, 2023
@carlosuc3m
Copy link

carlosuc3m commented Nov 6, 2023

HEllo @ctrueden ,
I just finished implementing shared memory IPC communication on the JDLL side:
https://github.com/bioimage-io/JDLL/blob/python-appose/src/main/java/io/bioimage/modelrunner/tensor/SharedMemoryArray.java
I have used jna-platform which contains a ready to use JNA for kernel32 and posix methods (what cpython uses for multiprocessing.shared_memory).
I had the urgency to implement it because when I tried to send relatively big images through the pipes it would take too much time. Changing to shm works great and has a huge performance improvement as the information sent by the pipes is way smaller now.
I did not change anything on the appose implementatio, I just carefully desigined the inputs that I sent to appose from JDLL and the outputs of the Python worker.

JDLL creates a script that is able to recreate the imglib2 array of the Java side:

# Create the shared memory object pointint to the shm block of interest with the wanted size
input_06112023_151243_shm_a9b8d59c_0087_4394_a9dd_fb54e23235e6 = shared_memory.SharedMemory(name='b024e140-5780-4ec6-a45d-f51cfa2ce3e7', size=20545536)

# Recreate the data array with the data from the shm block
input_06112023_151243 = xr.DataArray(np.ndarray(5136384, dtype='float32', buffer=input_06112023_151243_shm_a9b8d59c_0087_4394_a9dd_fb54e23235e6.buf).reshape([1, 304, 512, 33]), dims=["b", "y", "x", "c"], name="output")

All the code above is then fed to the Python worker to be executed, as you can see the memory location, size and shape is hardcoded on the JAva side (done here)

Then once the result is obtained, on the Python side, the shm block is created and the info to access that memory block is sent to java together with the data type, shape and whether is fortran order or not.

On the Python side:

shm_out_list = []
def convertNpIntoDic(np_arr):
  shm = shared_memory.SharedMemory(create=True, size=np_arr.nbytes)
  aux_np_arr = np.ndarray((np_arr.size), dtype=np_arr.dtype, buffer=shm.buf)
  aux_np_arr[:] = np_arr.flatten()
  shm_out_list.append(shm)
  shm.unlink()
  return {"data": shm.name, "shape": np_arr.shape, "appose_data_type__06112023_151246": "np_arr", "is_fortran": np.isfortran(np_arr), "dtype": str(np_arr.dtype)}

def convertXrIntoDic(xr_arr):
  shm = shared_memory.SharedMemory(create=True, size=xr_arr.values.nbytes)
  aux_np_arr = np.ndarray((xr_arr.values.size), dtype=xr_arr.values.dtype, buffer=shm.buf)
  aux_np_arr[:] = xr_arr.values.flatten()
  shm_out_list.append(shm)
  shm.unlink()
  return {"data": shm.name, "shape": xr_arr.shape, "axes": "".join(xr_arr.dims),"name": xr_arr.name, "appose_data_type__06112023_151246": "tensor", "is_fortran": np.isfortran(xr_arr.values), "dtype": str(xr_arr.values.dtype)}

The dictionary returned is then used by the following Java method to recreate a JDLL tensor or ImgLib array:
https://github.com/bioimage-io/JDLL/blob/cdf5469f0405be6aabc30d7380ba0f4c1baf2a11/src/main/java/io/bioimage/modelrunner/tensor/SharedMemoryArray.java#L422

Performance is improved greatly (at least on my old laptop) because the stdout does not need to flush millions of numbers and the result looks like the following:
https://twitter.com/carlosg91018370/status/1721526937686868443

A similar logic dould be implemented in Appose to support sending more high level objects such as numpy arrays or imglib2 images.

Regards,
Carlos

@ctrueden
Copy link
Member Author

ctrueden commented Nov 6, 2023

@carlosuc3m This is exciting news! I am especially happy to hear that this improved performance for you.

How straightforward do you think it will be to migrate the logic from your SharedMemoryArray upstream to appose-java into its SharedMemory? Is this the direction you think we should go?

@carlosuc3m
Copy link

I do not think it would be difficult to implement the logic on the Python side, but we would need to decide how to approach it.
First we would need to decide for which objects we support "conversion", is it numpy array to ImgLib2 RandomAccessible interval? Something else? And then we have to decide how to send them from one worker to another.

Currently, the dictionary of variables is encoded, flushed, read by the other process and when it is decoded we directly have the original variables. To send Numpy/RandomAccessiblInterval arrays we need to send the memory location (String), the datatype (String), the shape (List of longs) and whether the array is it fortran order or not (boolean).

In the case of JAva worker sending RAI to Python the Numpy array decoding can be done in two ways in my opinion.

  • On the Java side, let the user provide a RAI directly as the input:
Map<String, Object> inputsMap = new HashMap<String, Object>();
RandomAccessibleInterval<FloatType> rai = ArrayImgs.floats(new long[] {512, 512});
inputsMap.put("input_rai", rai);
Task task = python.task(code, inputsMap);

And then in the encoding stage, still on the Java side:

for (Entry entry : inputsMap.entrySet()) {
  if (entry.getValue() instanceof RandomAccessibleInterval) {
    // Method that converts RAI into a dictionary containing all the info needed to create a np array on the python side
    // It should also contain String indentifier to know that it is intended to be converted into numpy array
    // and not directly passed as a dictionary to the Python script
    inputsMap.put(entry.getKey(), SharedMemory.build(entry.getValue()));
  }
}

Finally on the Python side after decoding, iterate over the varaibles and if any of the variables contains the specific identifier, convert it to Np array.

Similar approach to send outputs from Python to Java

In both cases, the shared memory segments created by any of the processes should be treated carefully to ensure that they are unlinked once they are not used anymore.

@ctrueden
Copy link
Member Author

This issue has been addressed by apposed/appose-java#5. Big thanks to @tpietzsch and @carlosuc3m for their efforts. 🍻

@ctrueden
Copy link
Member Author

And for the record, the Python implementation maintains feature parity with Java thanks to apposed/appose-python#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants