Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CompileLoss to support nested y_true and y_pred #19879

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

james77777778
Copy link
Contributor

@james77777778 james77777778 commented Jun 19, 2024

Fix #19855

I think current implementation of CompileLoss restricts many possibility for custom loss:

  1. It strictly require y_true and y_pred to be single tensor
  2. It enforces the number of losses to match the number of the model's outputs in Functional model instead of the actual number of pairs of (y_true, y_pred)

Additionally, there are many pitfalls that can cause CompileLoss to malfunction. #19855

So, I have refactored CompileLoss by introducing the concept of y_true_spec and y_pred_spec.
The idea behind is to utilize tree and the specs to automatically pack y_true and y_pred for CompileLoss.build and CompileLoss.call.

For backward compatibility, they default to a plain "tensor" structure. If someone needs a nested structure for y_true or y_pred, they can use the newly introduced Loss.set_specs to set y_true_spec and y_pred_spec.

Tests have been added to verify this new behavior.

  • test_custom_loss_with_list_y_true: y_true as a list
  • test_custom_loss_with_nested_dict: y_true as a dict with a list and y_pred as a dict of dict

Additionally, the missing dtype in all losses has been corrected.

@Darkdragon84
Copy link

Darkdragon84 commented Jun 19, 2024

Very cool! Is there any way we can integrate the additional required step to call loss.set_specs into sth like the compile step or so? Right now only Loss objects can be used with the new feature, when you pass a callable, the usual single tensor specs are assumed.

I am wondering if we really need to tinker with the y_true and y_pred arguments to model.compute_loss so much? Let's consider the possible scenarios:

  1. Single loss, inputs are single tensors: we don't need to do anything to inputs, pass them right through to the single loss
  2. Single loss, inputs are nested structures of tensors: again, we don't need to do anything to inputs, pass them right through to the single loss
  3. multiple losses, inputs are single tensors: we need to broadcast the inputs to each loss, but leave their structures as they are
  4. multiple losses, inputs are nested structures of tensors: this one is the only tricky part, let's dive in one more level:
    1. inputs are dicts: compare keys with losses dicts, if everything matches, pass the values through to each loss unaltered
    2. inputs are nested sequences: this is the only part that can be ambiguous (I think): if both y_true and y_pred are sequences of the same length as the losses it could mean either:
      • each input element corresponds to one loss, so zip iterate through losses, y_true and y_pred
      • each loss takes the same nested input shapes, so broadcast

So, 4.ii. is the only case we have to figure out how to make unambiguous.

One easy solution would be to always require the loss argument to model.compile as well as the inputs to model.compute_loss to be a sequence (or dict) of same length, even for a single loss. It would be the same as for the metrics argument to model.compile, so I think this would be fine.

What do you think? Did I miss any scenarios?

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

keras/src/losses/__init__.py Show resolved Hide resolved
self.y_true_spec = self.SPEC_TENSOR
self.y_pred_spec = self.SPEC_TENSOR

def set_specs(self, y_true, y_pred):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion of having specs for y_true and y_pred feels overengineered to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is difficult or even impossible to avoid this function if y_true or y_pred is nested instead of a plain tensor.

We have no information about the structure of y_true and y_pred when compiling the losses. This was previously resolved through complicated analysis and a very strong assumption that each loss has single tensors as inputs.

Obviously, the single tensor input format is insufficient for many cases.
(Personally, in an object detection task, I had to stack multiple y_true values into a single tensor to make CompileLoss works. It took me some time to work around it.)

This solution for nested input structures is the only one I could come up with. It feels overengineered, but it works. Additionally, the build part is much cleaner than before.

f"Received: loss_weights length={len(flat_loss_weights)}, "
f"loss legnth={len(flat_losses)}"
)
flat_y_true_specs, flat_y_pred_specs = self._get_specs(flat_losses)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow how this is used. Could it be simplified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments to clarify the entire procedure.
Essentially, we need y_true_spec and y_pred_spec in a flat form because we have flattened the losses in that manner.
In call, these specs are necessary to correctly pack y_true and y_pred before iterating through them with self.flat_losses.

@james77777778
Copy link
Contributor Author

Hey @Darkdragon84

Thanks for your inputs.

Is there any way we can integrate the additional required step to call loss.set_specs into sth like the compile step or so?

It is quite difficult for me. The root cause is that we have no information about the structure of y_true and y_pred during build. I don't have an idea to automatically infer that.

One easy solution would be to always require the loss argument to model.compile as well as the inputs to model.compute_loss to be a sequence (or dict) of same length, even for a single loss.

Yeah, this will make CompileLoss much cleaner and we can avoid using specs. However, this change could break a lot of existing projects built with Keras. I don't think it is feasible.

@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jun 20, 2024
@Darkdragon84
Copy link

Yeah, this will make CompileLoss much cleaner and we can avoid using specs. However, this change could break a lot of existing projects built with Keras. I don't think it is feasible.

Yeah I understand, especially since the latest major release was very recently... Would it be possible to enable arbitrarily nested tensor inputs only if the losses and inputs are given as lists (i.e. confer to the format that the metrics also use?). Perhaps we can also find an easier way to resolve the ambiguity without having to require list inputs

@Darkdragon84
Copy link

It is quite difficult for me. The root cause is that we have no information about the structure of y_true and y_pred during build. I don't have an idea to automatically infer that.

If we found a way to disambiguate 4.ii. we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?). I'll think a bit about it.

@james77777778
Copy link
Contributor Author

james77777778 commented Jun 20, 2024

Would it be possible to enable arbitrarily nested tensor inputs only if the losses and inputs are given as lists

This should be feasible. It's the easiest scenario. (I can create a shortcut for this)
On second thought, it isn't clear to me how we can verify that the inputs are valid even with a single loss

I tried to build this testing table and we should pass all of them with the algorithm

All possible valid scenario in tests

loss y_pred y_true
loss y y
[loss] y y
{"a": loss} y y
{"a": loss} y y
loss [y] y
[loss] [y] y
{"a": loss} [y] y
{"a": loss} [y] y
loss {"a": y} y
[loss] {"a": y} y
{"a": loss} {"a": y} y
{"a": loss} {"a": y} y
[loss1, loss2] [y1, y2] (y1, y2)
[loss1, [loss2]] [y1, y2] (y1, y2)
{"a": loss1, "b": loss2} [y1, y2] (y1, y2)
{"a": loss1, "b": loss2} [y1, y2] (y1, y2)
{"a": loss1} [y1, y2] (y1, y2)
{"a": loss1, "b": loss2} {"a": y1, "b": y2} {"a": y1, "b": y2}
[loss1, loss2] {"a": y1, "b": y2} {"a": y1, "b": y2}

We should be careful that both y1 and y2 could be nested structure.

There are also some cases that should be considered invalid such as missed keys, but I haven't listed them above to simplify the table.

@fchollet
Copy link
Member

I don't think we should make changes to the base Loss API (i.e. let's restricts changes to the CompileLoss class). I also don't think we should make loss.set_specs() a public-facing API that users have to call if their losses take more than a single tensor input.

If we're going to enable nested structures for y_true and y_pred in losses, they should "just work". There should not be an additional config step that users have to take.

we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?

This would be the right approach -- if we need to keep track of input specs, let's base it on the structure of the inputs received at the first call to the loss.

@james77777778
Copy link
Contributor Author

we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?

This would be the right approach -- if we need to keep track of input specs, let's base it on the structure of the inputs received at the first call to the loss.

But how can we do this?
As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

(I'm going to separate the unrelated fix from this PR to concentrate the discussion here)

@fchollet
Copy link
Member

As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

So could the logic live entirely in CompileLoss and not have any user-facing footprint?

I have not thought about how to do it, but if the only way to make the feature work is to add a user facing set_specs step, then I think we're better off not supporting the feature at all by default (this would mean that users who need nested loss inputs will write their own compute_loss function).

@Darkdragon84
Copy link

Darkdragon84 commented Jun 21, 2024

But how can we do this?
As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

@james77777778
I guess at exactly that point we need to find a way to clearly infer from the inputs. If there are disambiguities we cannot resolve we probably need to retain the original behavior. Are you concerned about build?

@james77777778
Copy link
Contributor Author

james77777778 commented Jun 22, 2024

So could the logic live entirely in CompileLoss and not have any user-facing footprint?

I think this is not feasible. We should retain the original behavior.

Are you concerned about build?

Hey @Darkdragon84. I'm unsure what do you mean by build?

I'm thinking of another way to support nested y_true and y_pred by utilizing dict. It should be feasible to check whether loss, y_true and y_pred are all dicts or not. If they are all dicts, we can directly feed the inputs to the corresponding loss, no matter what their structure is.
(Maybe it is possible to extend this approach a little more to support other structures as well.)

A simple usage like this:

import keras


def my_loss(y_true, y_pred):
    y1, y2 = y_true
    pred_sum = keras.ops.sum(y_pred)
    return keras.ops.abs(keras.ops.sum(y1) - pred_sum) + keras.ops.abs(
        keras.ops.sum(y2) - pred_sum
    )


x = keras.Input(shape=(3,), name="input_a")
output_a = keras.layers.Dense(1, name="output_a")(x)
output_b = keras.layers.Dense(1, name="output_b", activation="sigmoid")(x)
model = keras.Model(x, {"output": [output_a, output_b]})  # <- dict

model.compile(loss={"output": my_loss})  # <- dict

x = keras.random.uniform([8, 3])
y1 = keras.random.uniform([8, 1])
y2 = keras.random.randint((8, 1), 0, 2)
model.fit(x, {"output": [y1, y2]}, batch_size=2, epochs=1)  # <- dict

"""
Currently, this causes an error:
ValueError: In the dict argument `loss`, key 'output' does not correspond to any model output. Received:
loss={'output': <function my_loss at 0x73e7325fe2a0>}
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PR Queue
Assigned Reviewer
Development

Successfully merging this pull request may close these issues.

CompileLoss truncates multiple loss inputs
4 participants