You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ATM we have two cron jobs tools/backups2datalad-update-cron (ran often) and tools/backups2datalad-update-cron-108 (ran long) since 108 contained zarrs and their backup is much more involved (see e.g. #363 ) than of regular files. But now other dandisets also start to contain zarrs. We need to figure out a workflow to perform updates in such a fashion that we do not need some custom separation across dandisets.
I think overall we should start using some proper job system to orchestrate updates. May be even a full blown celery with that flower to monitor the status? Then workflow could be
given a dandiset with changed time stamp, and no ongoing already job to update it, schedule an update_dandiset job which
for all zarr assets check if they exist, not being uploaded, and up-to-date (based on date).
if any missing - schedule a job to have zarr created/updated
if any out of date - schedule a job to have zarr updated
we might need a "registry" of jobs since can't query celery for ongoing/planned jobs so we skip dandiset if any job is still running
in any of above cases, skip updating the dandiset in this round
if no zarrs - or all zarrs found up to date [*], proceed with update of the dandiset as we do now
[*] alert -- race condition, unless we collect specific commits for each zarr so we update them to those and would be fine even if zarr is being modified
The text was updated successfully, but these errors were encountered:
ATM we have two cron jobs
tools/backups2datalad-update-cron
(ran often) andtools/backups2datalad-update-cron-108
(ran long) since 108 contained zarrs and their backup is much more involved (see e.g. #363 ) than of regular files. But now other dandisets also start to contain zarrs. We need to figure out a workflow to perform updates in such a fashion that we do not need some custom separation across dandisets.I think overall we should start using some proper job system to orchestrate updates. May be even a full blown celery with that flower to monitor the status? Then workflow could be
update_dandiset
job which[*] alert -- race condition, unless we collect specific commits for each zarr so we update them to those and would be fine even if zarr is being modified
The text was updated successfully, but these errors were encountered: