Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 out of 296 paths are not unique. We will try adding _obj- based on crc32 of object_id #1489

Closed
nrsc opened this issue Aug 19, 2024 · 8 comments

Comments

@nrsc
Copy link

nrsc commented Aug 19, 2024

Hello all,

Running into an issue while trying to organize my dandi set using the cli. Getting [ INFO] 1 out of 296 paths are not unique. We will try adding _obj- based on crc32 of object_id, and I can't seem to locate any more information about the non-unique path in question. I've been checking the logs but there are no further details on which path may be not unique.

Build info as follows.

[INFO    ] dandi 647577:124894109286400 dandi v0.63.0+5.g37b63509, hdmf v3.14.3, pynwb v2.8.1, h5py v3.11.0

I am using the dandi v0.63.0+5.g37b63509 build to address metadata problem I was facing previously, but this issue comes up whether I am using the +5.g37...09 build or not. I have added more files to the dataset, so I assume that somewhere along the way I added a file that trips the not unique issue. Unfortunately I do not know where to look to identify where the issue stems from, as the "non-unique path" is not written into the log.

Another error that shows up: Error: 'numpy.bytes_' object has no attribute 'encode'

I will attach the log as well.
2024.08.19-16.50.59Z-647577.log

Other details regarding build information

/ 2024-08-19 09:55:08,798 [ INFO] Loading metadata from 296 files [Parallel(n_jobs=-1)]: Using backend LokyBackend with 20 concurrent workers. /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
return func(args[0], **pargs)

/home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded.
return func(args[0], **pargs)

@kabilar
Copy link
Member

kabilar commented Aug 19, 2024

Thanks for the report, @nrsc. Could you post a list of the files (with the full path) you are attempting to organize (e.g. using the tree command)?

@nrsc
Copy link
Author

nrsc commented Aug 19, 2024

Here's the output of all the files in the folder.
dandi_output.txt

@yarikoptic
Copy link
Member

Hi @nrsc . Since you are a bug magnet (like yours truly), you might want to learn about an option to fall into python debugger which could potentially come out of help to troubleshoot more in the future

❯ dandi --help | grep pdb
  --pdb                           Fall into pdb if errors out

which if you specify (e.g. dandi --pdb organize ...) would lead you to drop into pdb debugger at that point of error on numpy.bytes_ which I fail to reproduce ATM. More on how to use pdb e.g. at https://realpython.com/python-debugging-pdb/ .

I will look now into providing more information about those non-unique paths - we must be able to provide more informative message there!

@yarikoptic
Copy link
Member

@nrsc try out

@nrsc
Copy link
Author

nrsc commented Aug 20, 2024

Thank you @yarikoptic. Will confirm effects of updates once I get the chance to sit back down with this again. Cheers all.

@nrsc
Copy link
Author

nrsc commented Aug 26, 2024

Hi @yarikoptic. Here is the output from pdb

(dandi-cli) nrsc@ai-connect:~/001065$ dandi --pdb organize
2024-08-26 11:00:56,664 [    INFO] Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-08-26 11:00:56,664 [    INFO] NumExpr defaulting to 8 threads.
2024-08-26 11:00:57,209 [    INFO] Logs saved in /home/nrsc/.cache/dandi-cli/log/2024.08.26-18.00.56Z-784710.log
Traceback (most recent call last):
  File "/home/nrsc/.local/bin/dandi", line 8, in <module>
    sys.exit(main())
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/nrsc/.local/lib/python3.10/site-packages/click/decorators.py", line 45, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/cli/base.py", line 126, in wrapper
    return f(*args, **kwargs)
  File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/cli/cmd_organize.py", line 112, in organize
    organize(
  File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/organize.py", line 822, in organize
    raise ValueError(
ValueError: Only 'dry' or 'move' mode could be used to operate in-place within a dandiset (no paths were provided)

> /home/nrsc/.local/lib/python3.10/site-packages/dandi/organize.py(822)organize()
-> raise ValueError(
(Pdb) 

I'm wondering whether the line;

2024-08-26T11:00:57-0700 [DEBUG   ] dandi 784710:134806063919104 Caught exception Only 'dry' or 'move' mode could be used to operate in-place within a dandiset (no paths were provided)

can point to the origin of the error.

Unfortunately I am not so well versed in python and python debugging. I've been an R guy for a while now, but I am interested in contributing best I can and learning about this process.

Should I pull the organize.py file that you pushed last week and try running the organize function again?

Cheers,

Scott

@nrsc
Copy link
Author

nrsc commented Aug 28, 2024

Hi @yarikoptic. Updated from the repository, and I now get the paths out when the error identifies the duplicated paths. That helped me identify and fix the issue. Thank you for providing the patch.

@nrsc nrsc closed this as completed Aug 28, 2024
@yarikoptic
Copy link
Member

sorry I have missed your prior comment and thanks for reporting back - bring us joy to have issues closed! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants