An intriguing failing of convolutional neural networks and the CoordConv solution

August 2019

tl;dr: Predicting coordinate transformation (predicting x and y directly from image and vice versa) with Conv Nets are hard. Adding a mesh grid to input image helps this task significantly.

Overall impression

The paper results are very convincing, and the technique is super efficient. Essentially it only concats two channel meshgrid to the original input.

RoI10D cited this paper.

Key ideas

Other coordinates works as well, such as radius and theta.
The idea can be useful for other tasks such as object detection, GAN, DRL, but not so much for classification.

Technical details

Summary of technical details

Notes

Uber made a video presenting this paper.
A concurrent paper from VGG has more theoretical analysis Semi-convolutional Operators for Instance Segmentation ECCV 2018.
This technique seems to alleviate checkerboard artifacts as well. This is a good alternative to using bilinear upsampling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coord_conv.md

coord_conv.md

An intriguing failing of convolutional neural networks and the CoordConv solution

Overall impression

Key ideas

Technical details

Notes

Files

coord_conv.md

Latest commit

History

coord_conv.md

File metadata and controls

An intriguing failing of convolutional neural networks and the CoordConv solution

Overall impression

Key ideas

Technical details

Notes