Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

Open
carrieeex opened this issue Jul 30, 2021 · 9 comments
Open

Multipage Calvo Trainer failed in Rodan-staging with 5 images #55

carrieeex opened this issue Jul 30, 2021 · 9 comments

Comments

@carrieeex
Copy link
Contributor

carrieeex commented Jul 30, 2021

I was trying to run the Multipage Calvo Trainer (Training model for Patchwise Analysis of Music Document) in Rodan-staging, with 5 images inputs and each image has 3 rgba - layer inputs: Layer 0 (background), Layer 1, Selected Regions that comes from the Pixel.js job in another workflow (all files related are attached below). It failed with the following error:

Error summary: InvalidArgumentError: output dimensions must be positive [[node functional_3/up_sampling2d/resize/ResizeNearestNeighbor (defined at code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py:227) ]] [Op:__inference_train_function_84115] Function call stack: train_function 

The error details are:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/code/Rodan/rodan/jobs/base.py", line 771, in run
    retval = self.run_my_task(inputs, settings, arg_outputs)
  File "/code/Rodan/rodan/jobs/Calvo_classifier/fast_calvo_trainer.py", line 186, in run_my_task
    batch_size=batch_size,
  File "/code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py", line 227, in train_msae
    epochs=epochs,
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 807, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  output dimensions must be positive
	 [[node functional_3/up_sampling2d/resize/ResizeNearestNeighbor (defined at code/Rodan/rodan/jobs/Calvo_classifier/training_engine_sae.py:227) ]] [Op:__inference_train_function_84115]

Function call stack:
train_function

To replicate this issue:

The workflow I used looks like:
image
where the input ports are image, Layer 0 (background), Layer 1, Selected Regions (each has five), trying with Salzinnes folios 006r, 066v, 106r, 166v, A06r, which can be found in my project in Rodan-staging (shared with devs) or here.

The setting for the Calvo Trainer was:
Maximum number of samples per label: 100
Patch width: 32
Patch height: 32
Maximum number of training epochs: 5
Batch Size: 1

@carrieeex
Copy link
Contributor Author

The inputs need to be assigned in order, which looks similar to:
diagram for testing calvo trainer in staging (port ordering)
(thanks to @martha-thomae for the screenshot!)

@carrieeex
Copy link
Contributor Author

For the devs (@kemalkongar @raviraina @GabbyHalpin) who will look into this: The project I shared in Rodan-staging is TEST_staging_fromJiali, the workflow I used was Calvo Trainer 5 images. (You could also try with workflow Patchwise (5) TRY random inputs order, it's the same with additional labeler and 5 PNG jobs for the image inputs. I've tried to run it with the exact same inputs, but it keeps processing and seems never end.)

The workflow run that failed is named as Patchwise (5) 006r, 066v, 106r, 166v, A06r, and the one that keeps processing is Patchwise (5) TRY 2.0.

For the inputs, image is the resized image; Layer 0 (background) is the NonPageLayer, Layer 1 is the PageLayer, and Selected regions is SelectedLayer in the rescources.

@kemalkongar
Copy link
Member

I will start looking into this as soon as HPC Fast Trainer is stable, thanks for the detailed issue.

@carrieeex
Copy link
Contributor Author

Note: @martha-thomae has tried the same job (Multipage Calvo Trainer) with 2 images and their layers, and it finished (so it works).

@kemalkongar
Copy link
Member

kemalkongar commented Jul 30, 2021

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

@carrieeex
Copy link
Contributor Author

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

I agree! Assuring the inputs was time-consuming. The switched order input issue is here: DDMAL/Rodan#615.

@napulen
Copy link
Member

napulen commented Jul 30, 2021

@deepio @napulen It may be a better idea to try to implement OrderedDict in Rodan, assuming it's a relatively easy (1-2 day) task rather than try to debug this and hope there isn't any human error. Because I can assure you, I will make at least 1 mistake testing this with 5 inputs, given the shifting names.

I don't expect that doing a ctrl+h of dict()s into OrderedDict()s to make any noise or create any problem. Maybe the hardest issue is to find all instances of dictionaries so that you don't accidentally leave some unordered dictionaries throughout.

Maybe, maybe some issues related to serialization could come up. Hopefully OrderedDicts are also serializable and will replace dicts without issue.

From a library perspective, OrderedDicts need no additional external packages (pip installs), just additional imports. No objections on my end to add those.

@kemalkongar
Copy link
Member

Then we'll look into this next week (please bring it up at the scrum since I'll be gone!). I've also read that OrderedDict may be slightly inefficient in Python 2 but it requires further reading.

@napulen
Copy link
Member

napulen commented Jul 30, 2021

@timothydereuse is leading the next scrum, I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants