Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"requirements" should document that libh5bshuf.so is required for certain HDF5 files #27

Open
KayDiederichs opened this issue Mar 27, 2022 · 13 comments

Comments

@KayDiederichs
Copy link

it took me some time to realize that for some data sets, libh5bshuf.so must be present in /usr/local/hdf5/lib/plugin , or in the directory pointed at by the environment variable HDF5_PLUGIN_PATH .
Could this pls be documented?
I am trying to compile bitshuffle-0.4.2 on M1 Apple but I am having a hard time; seems like it wants to compile for x86_64 Apple ... if somebody could prepare such a library then durin could be used natively on M1 Apple.

@graeme-winter
Copy link
Collaborator

@KayDiederichs thanks for flagging this - it should not be needed - should be compiled in. I now have an M1 mac so will push back up the to-do

Also see

5d0b7bd

does this extra line make it work for you? I am also aware @ndevenish has a cmake build coming - #26

(this repo could do with some attention)

@KayDiederichs
Copy link
Author

Thanks, Graeme. I tried the extra -noshlib option on x86_64 Linux but it makes no difference - /usr/local/hdf5/lib/plugin/libh5bshuf.so is still needed .
Now that you mention that it is already compiled in, I don't understand why this does not work - but maybe the filter plugin mechanism of HDF5 does not expect a compiled-in filter.

@ndevenish
Copy link

ndevenish commented Mar 28, 2022

I’ll try to have a look at this today. Are there [XDS] public M1 Mac builds yet? @graeme-winter i believe you have a test copy you can forward me otherwise?

@graeme-winter
Copy link
Collaborator

Appears to already be online at https://xds.mr.mpg.de/html_doc/downloading.html

@ndevenish
Copy link

Ahh, Iooked there but was being dumb and read "(emulated on apple silicon)" and just stopped.

@ndevenish
Copy link

@KayDiederichs, I can't reproduce this with our datasets. The reason that we don't believe the plugin should normally be required appears to be that - if it can - durin reads the data chunks directly and uses internal bitshuffle to manually decompress.

That appears to be controlled here:

o_eiger_desc->base.frame_func = &get_frame_from_chunk;

which appears to be gated behind a check for file structure layout:

  if (H5Lexists(visit_result->nxdetector, "data_000001", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_dectris_eiger_dataset_dims;
  } else if (H5Lexists(visit_result->nxdetector, "data", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_nxs_dataset_dims;
  } else if (H5Lexists(visit_result->nxdata, "data_000001", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_dectris_eiger_dataset_dims;
  } else if (H5Lexists(visit_result->nxdata, "data", H5P_DEFAULT) > 0) {
    ds_prop_func = &get_nxs_dataset_dims;
  } else {
    ERROR_JUMP(-1, done, "Could not locate detector dataset");
  }

so - I guess if you have an h5 file that only has a /entry/data or not a data_000001, then it looks like it doesn't use the direct chunk read, and thus the HDF5 normal image handling, which requires a plugin.

@graeme-winter, does this sound about right?

@KayDiederichs
Copy link
Author

thanks for the explanation! Wolfgang and I have been looking at a dataset consisting of xxx_master.h5 and xxx_data_000005.h5 . Durin worked well on my computers (which happen to have a long-forgotten /usr/local/hdf5/lib/plugin/libh5bshuf.so from 2015) but not on his (which don't). Since it took us a few days to realize that this difference is responsible for the failure, it would be good to document it.
BTW after reading Nick's message I symlinked xxx_data_000005.h5 to xxx_data_000001.h5 and this did change the behavior of durin, but it failed nevertheless with a different error message, so this trick does not work.

@fleon-psi
Copy link

fleon-psi commented Mar 28, 2022

Quick suggestion - to avoid providing separate bshuf filter:
a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source
b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c
Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

@ndevenish
Copy link

Ah, I think I was both unclear, and slightly misunderstood.

I was referring to the internal structure of the HDF5 file:

$ h5dump -n ins_6_1_master.h5
HDF5 "ins_6_1_master.h5" {
FILE_CONTENTS {
 group      /
 group      /entry
 group      /entry/data
 dataset    /entry/data/data
 ext link   /entry/data/data_000001 -> ins_6_1_000001.h5 /data
 ext link   /entry/data/data_000002 -> ins_6_1_000002.h5 /data
....

I think I thought that /entry/data/data_0000001 was the standard Nexus way, because that's what all of ours do. It looks like that is a DLS implementation detail - that we're checking directly in Durin.

Quick suggestion - to avoid providing separate bshuf filter:
a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source
b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c
Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

Hmm, this sounds a lot less work than I had anticipated (I thought it would be harder), and it would be nice to just resolve the problem without having to have the filters set up. (alternatively, I think that hdf5plugin (github, anaconda) does a lot of the work to get the plugin set compiling on most regular platforms, so pulling their binaries might work).

@KayDiederichs
Copy link
Author

Apple M1 is not a regular platform, and I could not find a libh5bshuf.so for it (and then it must also cooperate with the gcc-12 durin rather than with something compiled with Apple's CLANG compiler). See also issue #24

@ndevenish
Copy link

Well, it's a regular platform in conda-forge terms - and https://anaconda.org/conda-forge/hdf5plugin/files has osx-arm64 builds. (well, it's almost a regular platform, it's cross-compiled but works well enough for DIALS). All the conda-forge stuff is, however, compiled with (non-apple) clang, if it is an issue. FWIW on my M1, durin built with the current durin-main-branch Makefile works, compiled using h5cc from conda-forge which is built using clang.

I'm not suggesting any of this as a good solution, but if the option is struggling to manually build or trying to use a prebuilt binary, then it seems better than not being able to analyse data.

We should definitely try to handle this (common case) better though, here.

@KayDiederichs
Copy link
Author

Thanks for pointing to that URL! My google fu didn't find that. I downloaded libh5bshuf.dylib and it works (also) for me.

@graeme-winter
Copy link
Collaborator

Quick suggestion - to avoid providing separate bshuf filter: a) include bshuf_h5filter.c/bshuf_h5filter.h from bitshuffle source b) add H5Zregister call in durin itself, see example in bshuf_h5filter.c Then you will be able to use bitsuffle via HDF5 builtin filter mechanism.

@fleon-psi - yes we were discussing this earlier - though as @ndevenish pointed out there are also some "routing" questions between how we deal with Diamond data and more "native" (e.g. DECTRIS file writer) data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants