Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about PSPNet. #101

Open
kazucmpt opened this issue Apr 20, 2019 · 6 comments
Open

Questions about PSPNet. #101

kazucmpt opened this issue Apr 20, 2019 · 6 comments

Comments

@kazucmpt
Copy link

Thank you for uploading your code. It is very helpful to understand PSPNet.
I have two questions about your paper.

  1. You wrote

we use a pretrained ResNet model with the dilated network strategy to extract the feature map. The final feature map size is 1/8 of the input image.

in the paper. But I think the feature map size is 1/16 when you use ResNet50. Do you use only first 3 blocks of ResNet50?

  1. You wrote

Then we directly upsample the low-dimension feature maps to get the same size feature as the original feature map via bilinear interpolation. Finally, different levels of features are concatenated as the final pyramid pooling global feature.

in Section 3.2 in the paper. I understand we have to concatenate resized different levels of features and feature map extracted by ResNet 50. But after that, the image size is 1/8 of the input image. How did you resize them to the same image size as input image?

無題

@shentanyue
Copy link

I have same question.

@lxtGH
Copy link

lxtGH commented May 17, 2019

The output of segmentation map is 1/8 and use bilinear upsampling to recover the original size.

@alexcekay
Copy link

Hi there,

But I think the feature map size is 1/16 when you use ResNet50. Do you use only first 3 blocks of ResNet50?

To get 1/8 of the input size don't use a simple ResNet. You should use a DilatedResnet (https://arxiv.org/abs/1705.09914)

But after that, the image size is 1/8 of the input image. How did you resize them to the same image size as input image?

Yeah you're right there. Thats definitely not described well in the paper. For my implementation i did the following: Upscale all the pooling layers so that they have the same width/height as the output of the dilated resnet. Then concat them all. Add two convs and then upsample this 8 times to get the original image size

@qizhuli
Copy link

qizhuli commented May 29, 2019

@kazucmpt @shentanyue These are probably best clarified by referring to the official code. (And a good thing about Caffe is that the network architecture is fully and clearly laid out in a human-friendly text file ;P)

In the very end of their provided network definition files (see evaluation/prototxt directory), you will see that their networks are terminated with an Interp layer that upsamples the bottom blob by 8 times spatially:

layer {
  name: "conv6_interp"
  type: "Interp"
  bottom: "conv6"
  top: "conv6_interp"
  interp_param {
    zoom_factor: 8
  }
}

And if you would like more details on the Interp layer, you can check out its source code.

@allendred
Copy link

Would the upsample layer replace interp layer ?
My device do not support it.

@Abhishek2028
Copy link

Hi, Here in this architecture we created bins of sizes 1x1x512->1x1x1, 2x2x1, 3x3x1, 6x6x1.
1/8 of original feature map is 28.
How to Upsample features from 3x3 to 28x28. I tried with so many integer values. How to do upsampling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants