Handle changes when importing #6

manthey · 2022-05-04T15:47:13Z

Optionally modifying how repeated imports are done: currently if a file doesn't exist in the expected target directory, it is created. We frequently import a directory-tree of files, then organize them in Girder so they are not conceptually in the original directory-tree. Reimporting makes duplicates of all of these files. It would be great if there were an option in import to "skip if file already is in Girder somewhere" -- this can be done by matching the import path. If the file size has changed, we would update the existing file. The more sophisticated method would be to use the computed hash and match on that -- the file might have been renamed either on the assetstore OR in Girder, and, if the hash matches, it would be nice to not have a duplicate. This would be slower, as the hash has to be computed.

It would be nice to have a feature to flag any file in girder that is no longer available on an assetstore. For filesystem assetstores, this would confirm the path is reachable. For S3 assetstores, this would have to confirm the asset is still in the bucket (so would probably be slow). If we did this, we would probably want to show a list of such files (or only such files on a specific assetstore, or only such files from a specific import path) and then have an option to delete associated Girder items (and probably prune empty girder folders, too).

manthey · 2023-07-07T13:35:06Z

With recent changes, this becomes validating that files in an assetstore are still present and haven't changed, and, if missing or changed what do we do about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle changes when importing #6

Handle changes when importing #6

manthey commented May 4, 2022

manthey commented Jul 7, 2023

Handle changes when importing #6

Handle changes when importing #6

Comments

manthey commented May 4, 2022

manthey commented Jul 7, 2023