You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. Thanks for the codes and the detailed instruction.
I implemented sparse convolution into my encoder:
withtf.variable_scope('featureEncoder'):
auxiShape= (self.inputShape[0], self.inputShape[1], self.inputShape[2], 7)
featureShape= (self.inputShape[0], self.inputShape[1], self.inputShape[2], 32)
blockSize=8blockStride= (8,8)
blockOffset= (0,0)
blockCount= (self.divup(self.inputShape[1], blockStride[0]), self.divup(self.inputShape[2], blockStride[1]))
inBlockParams= { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
outBlockParams= { "dynamic_bsize": (blockSize, blockSize), "dynamic_boffset": blockOffset, "dynamic_bstride": blockStride }
ifnotself.training:
indices=sbnet_module.reduce_mask(self.mask, blockCount, tol=0.1, **inBlockParams)
# stack active overlapping tiles to batch dimensionstack=sbnet_module.sparse_gather(
auxi, indices.bin_counts, indices.active_block_indices, transpose=False, **inBlockParams)
else:
stack=auxi# perform dense convolution on a sparse stack of tilesstack=self.conv_layer2(stack, 7, 32, name='1')
stack=tf.nn.leaky_relu(stack)
stack=self.conv_layer2(stack, 32,32, name='2')
stack=tf.nn.leaky_relu(stack)
stack=self.conv_layer2(stack, 32,32, name='3')
stack=tf.nn.leaky_relu(stack)
stack=self.conv_layer2(stack, 32,32, name='4')
stack=tf.nn.leaky_relu(stack)
stack=self.conv_layer2(stack, 32,32, name='5')
stack=tf.nn.leaky_relu(stack)
# write/scatter the tiles back on top of original tensor# note that the output tensor is reduced by 1 on each side due to 'VALID' convolutionifnotself.training:
feature=sbnet_module.sparse_scatter(
stack, indices.bin_counts, indices.active_block_indices,
self.lastFeature, transpose=False, add=False, atomic=False, **outBlockParams)
feature.set_shape(featureShape)
else:
feature=stack
self.training is set False when training and True when testing. Variable mask is generated outside the network and fed in via tf.placeholder. So does self.lastFeature.
I tried to measure the inference time with timeline:
However, I can't find time records of layers under 'featureEncoder'. And there are two bars captioned unknown, the second of which is strangely long. Some Pooling and LeakyRelu‘s time is also strange, costing nearly 2ms.
I wonder how I can get the proper time measurement. Thanks.
I wrap the structures behind the featureEncoder with with tf.control_dependencies([feature]): and now the timeline result seems fine. It's nearly the same as sbnet_module.cuda_timer's result.
However, the time cost of the featureEncoder increases heavily. My input is (720, 1280, 7). The original network spends roughly 38ms, where the featureEncoder takes up about 10ms. I want to reduce the inference time to less than 33ms. After wrapping the featureEncoder with SparseScatter and SparseGather, the network's inference time comes to 44ms with all '1' in the mask.
When I feed the mask of zero values, strange happens. When the sparsity comes to nearly 0.1, the time rises to 150ms. Convolutions under the featureEncoder become discrete pieces shown in the timeline chart. The time is 90ms when the sparsity goes to 0.5 and 64ms with 0.8.
I checked the issue. I've tried many block sizes and sparsity but still seeing no improvement. Firstly I guess the problem is because the sparse convolution reduces the GPU memory usage, causing it lazy.
But since your experiment used GTX1080ti, I think the method works well on powerful GPUs.
I must have misunderstood something and made serious mistakes. Hope to receive answers. Thanks.
IwakuraRein
changed the title
timeline.timeline outputs improper time records
Time cost increases
Aug 1, 2021
Hi. Thanks for the codes and the detailed instruction.
I implemented sparse convolution into my encoder:
self.training
is setFalse
when training andTrue
when testing. Variablemask
is generated outside the network and fed in viatf.placeholder
. So doesself.lastFeature
.I tried to measure the inference time with timeline:
However, I can't find time records of layers under 'featureEncoder'. And there are two bars captioned unknown, the second of which is strangely long. Some Pooling and LeakyRelu‘s time is also strange, costing nearly 2ms.
I wonder how I can get the proper time measurement. Thanks.
My Environment
TensorFlow Version: 1.15.0
Operating System: Ubuntu 16.04
Python Version: 3.6.13
CUDA Version: 10.0
CUDNN Version: 7.6.4
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67
The text was updated successfully, but these errors were encountered: