Organized paper notes

Last Updated: 07/04/2019

The following organizes the paper reading notes according to subfields.

Deep Learning in General

Bag of Tricks for Image Classification with Convolutional Neural Networks [Notes] CLS
Bag of Freebies for Training Object Detection Neural Networks [Notes]
mixup: Beyond Empirical Risk Minimization [Notes] ICLR 2018
Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) [Notes] CVPR 2017
Group Normalization [Notes] ECCV 2018
Spatial Transformer Networks [Notes] NIPS 2015

Datasets

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite [Notes] CVPR 2012
Vision meets Robotics: The KITTI Dataset [Notes] IJRR 2013
A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms [Notes] ITSC 2013

Bayesian DL

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? [Notes] NIPS 2017
Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding [Notes]BMVC 2017

DRL

Human-level control through deep reinforcement learning (Nature DQN paper) [Notes] DRL
Deep Reinforcement Learning for Vessel Centerline Tracing in Multi-modality 3D Volumes [Notes] DRL MI
Deep Reinforcement Learning for Flappy Bird [Notes] DRL

2D Object Detection

Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction (Very nice illustration of 1 and 2 stage object detection)
Light-Head R-CNN: In Defense of Two-Stage Object Detector (Megvii) [Notes]
Object Detection based on Region Decomposition and Assembly [Notes] AAAI 2019
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [Notes] AAAI 2019

Anchor-free

Review of Anchor-free methods (知乎Blog) 目标检测：Anchor-Free时代 Anchor free深度学习的目标检测方法 My Slides on CSP
DenseBox: Unifying Landmark Localization with End to End Object Detection
CornerNet: Detecting Objects as Paired Keypoints [Notes] ECCV 2018
ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points [Notes] CVPR 2019
CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection (center and scale prediction) [Notes] CVPR 2019
FSAF: Feature Selective Anchor-Free Module for Single-Shot Object Detection [Notes] CVPR 2019
FoveaBox: Beyond Anchor-based Object Detector (anchor-free) [Notes]
CenterNet: Objects as points (from ExtremeNet authors) [Notes]
CenterNet: Object Detection with Keypoint Triplets [Notes]
FCOS: Fully Convolutional One-Stage Object Detection [Notes]

Image Segmentation

Panoptic Segmentation [Notes] PanSeg
Panoptic Feature Pyramid Networks [Notes] PanSeg
Attention-guided Unified Network for Panoptic Segmentation [Notes] PanSeg
Path Aggregation Network for Instance Segmentation [Notes] CVPR 2018

3D Object Detection

Classification on Point Cloud

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 [Notes]
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space NIPS 2017 [Notes]
Dynamic Graph CNN for Learning on Point Clouds [Notes]
VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition (VoxNet) [Notes]
Multi-view Convolutional Neural Networks for 3D Shape Recognition (MVCNN) [Notes] ICCV 2015
3D ShapeNets: A Deep Representation for Volumetric Shapes [Notes] CVPR 2015
Volumetric and Multi-View CNNs for Object Classification on 3D Data [Notes] CVPR 2016
Beyond the pixel plane: sensing and learning in 3D (blog, 中文版本)
Review of Geometric deep learning 几何深度学习前沿 (from 知乎) (Up to CVPR 2018)

Detection on Point Cloud

Frustum PointNets for 3D Object Detection from RGB-D Data (F-PointNet) [Notes] CVPR 2018
PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud (SOTA for 3D object detection) [Notes] CVPR 2019
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection CVPR 2018 (Apple, first end-to-end point cloud encoding to grid)
SECOND: Sparsely Embedded Convolutional Detection Sensors 2018 (builds on VoxelNet)
PointPillars: Fast Encoders for Object Detection from Point Clouds [Notes] CVPR 2019 (builds on SECOND)

Sensor Fusion

Multi-View 3D Object Detection Network for Autonomous Driving (MV3D) [Notes] CVPR 2017 (Baidu, sensor fusion, BV proposal)
Joint 3D Proposal Generation and Object Detection from View Aggregation (AVOD) [Notes] IROS 2018 (sensor fusion, multiview proposal)
Multi-Task Multi-Sensor Fusion for 3D Object Detection [Notes] CVPR 2019 (@Uber, sensor fusion)

Monocular 3D Object Detection

Representation Transformation (Pseudo-Lidar, BEV)

Multi-Level Fusion based 3D Object Detection from Monocular Images [Notes] CVPR 2018 (precursor to pseudo-lidar)
Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving [Notes] CVPR 2019
Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving [Notes]
Pseudo lidar-e2e: Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud [Notes] ICCV 2019
pseudo lidar color: Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving [Notes] ICCV 2019
ForeSeE: Task-Aware Monocular Depth Estimation for 3D Object Detection [Notes] (successor to pseudo-lidar) (mono 3DOD SOTA)
BEV-IPM: Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image [Notes] IV 2019
OFT: Orthographic Feature Transform for Monocular 3D Object Detection [Notes] (Convert camera to BEV, Alex Kendall) BMVC 2019
BirdGAN: Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles [Notes] IROS 2019

Keypoints and Shape

Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [Notes] CVPR 2017
3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare [Notes] CVPR 2018
ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape [Notes] CVPR 2019
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization [Notes] AAAI 2019
MonoGRNet 2: Monocular 3D Object Detection via Geometric Reasoning on Keypoints [Notes] (estimates depth from keypoints)
MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation [Notes] ICCV 2019
GPP: Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road [Notes] (UCSD, mono 3DOD)

Distance via 2D/3D constraint

Deep3dBox: 3D Bounding Box Estimation Using Deep Learning and Geometry [Notes] (from Zoox) CVPR 2017
Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints [Notes] IEEE ICIP 2019
FQNet: Deep Fitting Degree Scoring Network for Monocular 3D Object Detection [Notes] CVPR 2019 (Mono 3DOD, Jiwen Lu)
Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors [Notes] (Stefano Soatto) AAAI 2019
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving [Notes] CVPR 2019 (2d bbox height)
MonoPSR: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [Notes] CVPR 2019 (2d bbox height)
CasGeo: 3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results [Notes]
MVRA: Multi-View Reprojection Architecture for Orientation Estimation [Notes] ICCV 2019

Direct generation of 3D proposals

Mono3D: Monocular 3D Object Detection for Autonomous Driving [Notes] CVPR2016 (lots of 3D anchors)
TLNet: Triangulation Learning Network: from Monocular to Stereo 3D Object Detection [Notes] CVPR 2019 (mono baseline with 3D anchors)
SS3D: Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss [Notes] (rergess distance from images, centernet like)
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [Notes] ICCV 2019 (Xiaoming Liu)
MonoDIS: Disentangling Monocular 3D Object Detection [Notes] ICCV 2019
Joint Monocular 3D Vehicle Detection and Tracking [Notes] ICCV 2019 (directly regress 1/d, Berkeley DeepDrive)
CenterNet: Objects as points (from ExtremeNet authors) [Notes] (directly regress 1/d)
Joint Monocular 3D Vehicle Detection and Tracking [Notes] ICCV 2019 (Berkeley DeepDrive)

Others

Deep Optics for Monocular Depth Estimation and 3D Object Detection [Notes] ICCV 2019

Depth Estimation

Deep Depth Completion of a Single RGB-D Image [Notes] CVPR 2018 (indoor)
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image [Notes] CVPR 2019 (outdoor)
SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video [Notes] CVPR 2017
Monodepth2: Digging Into Self-Supervised Monocular Depth Estimation [Notes] (@Niantic)

Practical Autonomous Driving Topics

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving [Notes]
TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents [Notes] AAAI 2019 (oral)
Semantic Segmentation on Radar Point Clouds [[Notes]] (from Daimler AG) FUSION 2018
DeepSignals: Predicting Intent of Drivers Through Visual Signals [Notes] ICRA2019 (@Uber, turn signal detection)
Deep Radar Detector [Notes] RadarCon 2019

Model Compression

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (MobileNets) [Notes]
MobileNetV2: Inverted Residuals and Linear Bottlenecks (MobileNets v2) [Notes] CVPR 2018
MobileNetV3: Searching for MobileNetV3 [Notes]
MnasNet: Platform-Aware Neural Architecture Search for Mobile [Notes] CVPR 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Notes] ICML 2019
Pruning Filters for Efficient ConvNets [Notes] ICLR 2017
Channel Pruning for Accelerating Very Deep Neural Networks ICCV 2017 (Face++, Yihui He) [Notes]
Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks [Notes] NIPS 2018 Talk
LeGR: Filter Pruning via Learned Global Ranking [Notes]
Rethinking the Value of Network Pruning ICLR 2019
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [Notes] ICLR 2019

NAS

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [Notes] CVPR 2019
AutoAugment: Learning Augmentation Policies from Data [Notes] CVPR 2019
AMC: AutoML for Model Compression and Acceleration on Mobile Devices ECCV 2018 (Song Han, Yihui He)

Video Recognition

Long-Term Feature Banks for Detailed Video Understanding [Notes] Video
Non-local Neural Networks [Notes] Video CVPR 2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) [Notes]Video CVPR 2017
Initialization Strategies of Spatio-Temporal Convolutional Neural Networks [Notes] Video
Detect-and-Track: Efficient Pose Estimation in Videos [Notes] ICCV 2017 Video
SlowFast Networks for Video Recognition [Notes] Video

Medical Imaging

Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection [Notes] MI
Deep Learning Based Rib Centerline Extraction and Labeling [Notes] MI MICCAI 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

organized.md

organized.md

Organized paper notes

Deep Learning in General

Datasets

Bayesian DL

DRL

2D Object Detection

Anchor-free

Image Segmentation

3D Object Detection

Classification on Point Cloud

Detection on Point Cloud

Sensor Fusion

Monocular 3D Object Detection

Representation Transformation (Pseudo-Lidar, BEV)

Keypoints and Shape

Distance via 2D/3D constraint

Direct generation of 3D proposals

Others

Depth Estimation

Practical Autonomous Driving Topics

Model Compression

NAS

Video Recognition

Medical Imaging

Files

organized.md

Latest commit

History

organized.md

File metadata and controls

Organized paper notes

Deep Learning in General

Datasets

Bayesian DL

DRL

2D Object Detection

Anchor-free

Image Segmentation

3D Object Detection

Classification on Point Cloud

Detection on Point Cloud

Sensor Fusion

Monocular 3D Object Detection

Representation Transformation (Pseudo-Lidar, BEV)

Keypoints and Shape

Distance via 2D/3D constraint

Direct generation of 3D proposals

Others

Depth Estimation

Practical Autonomous Driving Topics

Model Compression

NAS

Video Recognition

Medical Imaging