You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I convert the image data from tfrecord format to jpg formet, I found that, each jpg file is actually 4 square images concatenated together. And the the FileBasedDataset does nothing regarding that. And I don't see the FSNSLocalizationNet do separate localization for these 4 images. How to understand this?
Yes, FSNS is organized in such a way that one sample is actually comprised of 4 samples.
The code snippet you refer to handles this case. If the flag uses_original_data is set to True the incoming image with a shape of (batch_size, 3, 150, 600) (height 150 pixels and width 600 pixels) is reorganized to a batch with the following shape (4, 3, 150, 150). We basically convert one image to 4 images and handle them independently. Later, they are fused together again.
When I convert the image data from tfrecord format to jpg formet, I found that, each jpg file is actually 4 square images concatenated together. And the the FileBasedDataset does nothing regarding that. And I don't see the FSNSLocalizationNet do separate localization for these 4 images. How to understand this?
if self.uses_original_data:
# handle each individual view as increase in batch size
batch_size, num_channels, height, width = images.shape
images = F.reshape(images, (batch_size, num_channels, height, 4, -1))
images = F.transpose(images, (0, 3, 1, 2, 4))
images = F.reshape(images, (batch_size * 4, num_channels, height, width // 4))
does it consider 4 different images as an additional dimension for the localization?
The text was updated successfully, but these errors were encountered: