Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

carrieeex · 2021-07-30T15:40:30Z

I was trying to run the Multipage Calvo Trainer (Training model for Patchwise Analysis of Music Document) in Rodan-staging, with 5 images inputs and each image has 3 rgba - layer inputs: Layer 0 (background), Layer 1, Selected Regions that comes from the Pixel.js job in another workflow (all files related are attached below). It failed with the following error:

Error summary: InvalidArgumentError: output dimensions must be positive [[node functional_3/up_sampling2d/resize/ResizeNearestNeighbor (defined at code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py:227) ]] [Op:__inference_train_function_84115] Function call stack: train_function

The error details are:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/code/Rodan/rodan/jobs/base.py", line 771, in run
    retval = self.run_my_task(inputs, settings, arg_outputs)
  File "/code/Rodan/rodan/jobs/Calvo_classifier/fast_calvo_trainer.py", line 186, in run_my_task
    batch_size=batch_size,
  File "/code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py", line 227, in train_msae
    epochs=epochs,
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 807, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  output dimensions must be positive
	 [[node functional_3/up_sampling2d/resize/ResizeNearestNeighbor (defined at code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py:227) ]] [Op:__inference_train_function_84115]

Function call stack:
train_function

To replicate this issue:

The workflow I used looks like:

where the input ports are image, Layer 0 (background), Layer 1, Selected Regions (each has five), trying with Salzinnes folios 006r, 066v, 106r, 166v, A06r, which can be found in my project in Rodan-staging (shared with devs) or here.

The setting for the Calvo Trainer was:
Maximum number of samples per label: 100
Patch width: 32
Patch height: 32
Maximum number of training epochs: 5
Batch Size: 1

The text was updated successfully, but these errors were encountered:

carrieeex · 2021-07-30T15:47:44Z

The inputs need to be assigned in order, which looks similar to:

(thanks to @martha-thomae for the screenshot!)

carrieeex · 2021-07-30T16:02:07Z

For the devs (@kemalkongar @raviraina @GabbyHalpin) who will look into this: The project I shared in Rodan-staging is TEST_staging_fromJiali, the workflow I used was Calvo Trainer 5 images. (You could also try with workflow Patchwise (5) TRY random inputs order, it's the same with additional labeler and 5 PNG jobs for the image inputs. I've tried to run it with the exact same inputs, but it keeps processing and seems never end.)

The workflow run that failed is named as Patchwise (5) 006r, 066v, 106r, 166v, A06r, and the one that keeps processing is Patchwise (5) TRY 2.0.

For the inputs, image is the resized image; Layer 0 (background) is the NonPageLayer, Layer 1 is the PageLayer, and Selected regions is SelectedLayer in the rescources.

kemalkongar · 2021-07-30T16:04:20Z

I will start looking into this as soon as HPC Fast Trainer is stable, thanks for the detailed issue.

carrieeex · 2021-07-30T16:06:29Z

Note: @martha-thomae has tried the same job (Multipage Calvo Trainer) with 2 images and their layers, and it finished (so it works).

kemalkongar · 2021-07-30T16:11:21Z

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

carrieeex · 2021-07-30T16:19:12Z

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

I agree! Assuring the inputs was time-consuming. The switched order input issue is here: DDMAL/Rodan#615.

napulen · 2021-07-30T16:28:48Z

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

I don't expect that doing a ctrl+h of dict()s into OrderedDict()s to make any noise or create any problem. Maybe the hardest issue is to find all instances of dictionaries so that you don't accidentally leave some unordered dictionaries throughout.

Maybe, maybe some issues related to serialization could come up. Hopefully OrderedDicts are also serializable and will replace dicts without issue.

From a library perspective, OrderedDicts need no additional external packages (pip installs), just additional imports. No objections on my end to add those.

kemalkongar · 2021-07-30T16:44:29Z

Then we'll look into this next week (please bring it up at the scrum since I'll be gone!). I've also read that OrderedDict may be slightly inefficient in Python 2 but it requires further reading.

napulen · 2021-07-30T17:37:46Z

@timothydereuse is leading the next scrum, I think

carrieeex added the PROJECT: Calvo Classifier label Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

carrieeex commented Jul 30, 2021 •

edited

Loading

carrieeex commented Jul 30, 2021

carrieeex commented Jul 30, 2021

kemalkongar commented Jul 30, 2021

carrieeex commented Jul 30, 2021

kemalkongar commented Jul 30, 2021 •

edited

Loading

carrieeex commented Jul 30, 2021

napulen commented Jul 30, 2021

kemalkongar commented Jul 30, 2021

napulen commented Jul 30, 2021

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

Comments

carrieeex commented Jul 30, 2021 • edited Loading

carrieeex commented Jul 30, 2021

carrieeex commented Jul 30, 2021

kemalkongar commented Jul 30, 2021

carrieeex commented Jul 30, 2021

kemalkongar commented Jul 30, 2021 • edited Loading

carrieeex commented Jul 30, 2021

napulen commented Jul 30, 2021

kemalkongar commented Jul 30, 2021

napulen commented Jul 30, 2021

carrieeex commented Jul 30, 2021 •

edited

Loading

kemalkongar commented Jul 30, 2021 •

edited

Loading