-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoder depth & output stride on DeeplabV3 & DeeplabV3+ #991
Conversation
This reverts commit 79d5568.
This is a PR that I tried to fix before, but so far it seems that some problems of DeeplabV3+ have been solved and merged by #986. I only set this PR as a draft... |
Hi, @brianhou0208. For the DeeplabV3+ part, your fix may be better because you can set I made a trial merge commit because there is a conflict between my PR and yours and my PR has already been merged. Can you please check it? The commits being compared are d490cdf and munehiro-k/segmentation_models.pytorch@2f57c5d. Test Code OutputsMy one in #986 (note that the
DeeplabV3
DeeplabV3+
|
Hi @munehiro-k , It seems you identified this issue earlier than I did, and your PR #986 also solved some problems with DeepLabV3+. I observed that if |
The merged codes (munehiro-k:merge/fix_deeplab and brianhou0208:fix_deeplab) are quite similar, but brianhou0208:fix_deeplab deletes a part of the docstring. The deletion of I put diffs between my merge and brianhou0208's merge. legend < munehiro-k:merge/fix_deeplab
> brianhou0208:fix_deeplab model.py 82d81
<
149d147
< aux_params: Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build
184a183,189
>
> if encoder_output_stride not in [8, 16]:
> raise ValueError(
> "DeeplabV3Plus support output stride 8 or 16, got {}.".format(
> encoder_output_stride
> )
> ) decoder.py 81c81
< if encoder_depth not in (3, 4, 5):
---
> if encoder_depth < 3:
83,87c83,85
< "Encoder depth should be 3, 4 or 5, got {}.".format(encoder_depth)
< )
< if output_stride not in (8, 16):
< raise ValueError(
< "Output stride should be 8 or 16, got {}.".format(output_stride)
---
> "Encoder depth for DeepLabV3Plus decoder cannot be less than 3, got {}.".format(
> encoder_depth
> ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
self.highres_input_index = -4 | ||
|
||
highres_in_channels = encoder_channels[self.highres_input_index] | ||
highres_in_channels = encoder_channels[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only concern here is speed, cause the higher resolution feature we take, the more we need to process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might have been some misunderstanding.
Currently, regardless of the encoder_depth
, high_res_features
is always fixed at 1/4 of the input resolution, which improves speed.
Before #986 and #991, the resolution of high_res_features
varied depending on the encoder_depth
:
encoder_depth=5
:high_res_features
was 1/4 of the input resolution.encoder_depth=4
:high_res_features
was 1/2 of the input resolution.encoder_depth=3
:high_res_features
matched the input resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your reference, in PR #986, I updated the high_res_features
logic as follows:
- When
encoder_depth
=5:encoder_channels[-4]
, which corresponds to index 2 (1/4 of the input resolution). - When
encoder_depth
=4:encoder_channels[-3]
, which corresponds to index 2 (1/4 of the input resolution). - When
encoder_depth
=3:- If
encoder_output_stride
=8:encoder_channels[-2]
, which corresponds to index 3 (1/2 of the input resolution). - If
encoder_output_stride
=16:encoder_channels[-3]
, which corresponds to index 2 (1/4 of the input resolution).
- If
The only difference in high_res_features
is when encoder_depth
=3 and encoder_output_stride
=8.
I think PR #991 is preferable because it allows you to consistently set upsampling
to 4 to preserve the input/output tensor sizes. In contrast, PR #986 requires setting upsampling
to 2 to maintain sizes when encoder_depth
=3 and encoder_output_stride
=8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanations! It's clearer now 🙏
Hi @qubvel ,
This PR solve issues in DeepLabV3 and DeepLabV3+ models that may occur with different
encoder_depth
andoutput_stride
configurations.Updates & Fixes
output_stride
parameter.out_channels
parameter in the decoder.segmentation_head.upsampling
to ensure input and output sizes match in DeepLabV3 for allencoder_depth
andoutput_stride
settings.high_res_features
in DeepLabV3+ to always use a 1/4 resolution of the input, ensuring no errors occur withencoder_depth
values between 3 and 5.Encoder Depth & Output Stride
Encoder Output Shape
Downsample factor
DeeplabV3
From the tables above:
When
encoder_depth
is between 1 and 3, the upsampling ratio between input and output isscale_factor=2**encoder_depth
.When
encoder_depth
is 4 or 5, the upsampling ratio isscale_factor=output_stride
.The following code ensures input and output sizes match for all
encoder_depth
values in DeepLabV3:Test Code
Output
DeeplabV3+
Originally, DeepLabV3+ only supported
encoder_depth
values of 4 and 5 due to:encoder_depth=5
, there are no errors.encoder_depth=4
, errors occur with MixVisionTransformer encoders (e.g., dump feature channel = 0).encoder_depth=3
, it triggers a "list index out of range" error.By fixing
high_res_features
to always use the 1/4 resolution input and limitingencoder_depth
to >=3, these issues are resolved:Test Code
Output