General description

Reproduce the MV3D network described by the [1].

According to [1], training are divided into two stages:

Run train_proposal.py to train the proposal network. The network is to generate proposal candidates. Aftering training, run prediction_proposal.py to generate proposal candidates.
Run train_fusion.py to train the fusion network. The network is to generate final results. Aftering training, run prediction_fusion.py to generate final results.

Training data for proposal network (train_proposal.py):

training_data: a list of data

training_data[0] = birdview

training_data[1] = anchor_label

training_data[2] = anchor_reg

training_data[3] = anchor_label_mask

training_data[4] = anchor_reg_mask

validation_data has same structure as training_data.

The format of prepared data should be:

birdview: Birdview generated from LIDAR point cloud. Format: .npy file. [number of images, row of a image, col of a image, channels]

anchor_label: Anchor labels corresponding to birdview. Format: .npy file. [number of images, row of a image, col of a image, number of anchors]. The values in anchor_label are either be one or zero. Being ones means the anchor are postive and otherwise are negative or invalid.

anchor_label_mask: Mask for valid anchors of label. Format: .npy file. [number of images, row of a image, col of a image, number of anchors]. The values in anchor_label_mask are either be one or zero. Being one means the corresponding anchors of label are counted in training otherwise are not.

anchor_reg: Anchor regression value corresponding to birdview. Format: .npy file. [number of images, row of a image, col of a image, number of anchors, number of regression values]. The values in anchor_reg are the regression values of each anchor.

anchor_reg_mask: Mask for valid anchors of regression. Format: .npy file. [number of images, row of a image, col of a image, number of anchors]. The values in anchor_reg_mask are either be one or zero. Being one means the corresponding anchors of rehression are counted in training otherwise are not

Training data for fusion network (train_fusion.py):

training_data: a list containing training data

training_data[0] = data_birdview

training_data[1] = data_frontview

training_data[2] = data_rgbview

training_data[3] = data_birdview_rois

training_data[4] = data_frontview_rois

training_data[5] = data_rgbview_rois

training_data[6] = data_birdview_box_ind

training_data[7] = data_frontview_box_ind

training_data[8] = data_rgbview_box_ind

training_data[9] = data_ROI_labels

training_data[10] = data_ROI_regs

training_data[11] = data_cls_mask

training_data[12] = data_reg_mask

Validation_data has same structure as training_data.

The format of prepared data should be:

data_birdview: Birdview generated from LIDAR point cloud. Format: .npy file. [number of images, row of a image, col of a image, channels]

data_frontview: Frontview generated from LIDAR point cloud. Format: .npy file. [number of images, row of a image, col of a image, channels]

data_rgbview: Rgbview generated from rgb image. Format: .npy file. [number of images, row of a image, col of a image, channels]

data_birdview_rois: Region coordinate for birdview region pooling. Format: .npy file. [number of images * 200 regions, 4 (y1, x1, y2, x2)]

data_frontview_rois: Region coordinate for frontview region pooling. Format: .npy file. [number of images * 200 regions, 4 (y1, x1, y2, x2)]

data_rgbview_rois: Region coordinate for rgbview region pooling. Format: .npy file. [number of images * 200 regions, 4 (y1, x1, y2, x2)]

data_birdview_box_ind: The value of data_birdview_box_ind[i] specifies the birdview that the i-th box refers to. Format: .npy file. [number of images * 200 regions]

data_frontview_box_ind: The value of data_birdview_box_ind[i] specifies the frontview that the i-th box refers to. Format: .npy file. [number of images * 200 regions]

data_rgbview_box_ind: The value of data_birdview_box_ind[i] specifies the rgbview that the i-th box refers to. Format: .npy file. [number of images * 200 regions]

data_ROI_labels: Region labels. Format: .npy file. [number of images, 200 regions]. The values in data_ROI_labels are either be one or zero. Being ones means the region are postive otherwise are negative or invalid.

Reference

[1] Chen, Xiaozhi, et al. "Multi-view 3d object detection network for autonomous driving." IEEE CVPR. Vol. 1. No. 2. 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
TensorflowUtils.py		TensorflowUtils.py
birdview_proposal_net.py		birdview_proposal_net.py
data preparation		data preparation
fusion_net.py		fusion_net.py
model.py		model.py
prediction_fusion.py		prediction_fusion.py
prediction_proposal.py		prediction_proposal.py
train_fusion.py		train_fusion.py
train_proposal.py		train_proposal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General description

Training data for proposal network (train_proposal.py):

Training data for fusion network (train_fusion.py):

Reference

About

Releases

Packages

Languages

wayne0908/Multi-View-3D-Object-Detection-Network-for-Autonomous-Driving

Folders and files

Latest commit

History

Repository files navigation

General description

Training data for proposal network (train_proposal.py):

Training data for fusion network (train_fusion.py):

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages