Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors for training! #30

Open
balzac001 opened this issue Mar 16, 2018 · 5 comments
Open

Errors for training! #30

balzac001 opened this issue Mar 16, 2018 · 5 comments

Comments

@balzac001
Copy link

balzac001 commented Mar 16, 2018

Hello Xdever!
The test.py is ok, but when I tried the main.py, there is errors
I download MS COCO 2014, and extracted it! Then I use the following command:
python main.py -dataset /home/hfl/RFCN-tensorflow-master-test1/COCO -name /home/hfl/RFCN-tensorflow-master-test1/export2
Then the errors like this:

WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam_1" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam_1" not found in file to load.
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_mean
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean
WARNING: Unused variable: global_step
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta
Done.
BoxLoader: Loaded 123287 files.
2018-03-16 21:19:23.107145: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107192: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107238: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107249: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "main.py", line 157, in
res = runManager.modRun(i)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 97, in modRun
return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 71, in runAndMerge
res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape', defined at:
File "main.py", line 119, in
trainOp=createUpdateOp()
File "main.py", line 106, in createUpdateOp
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
return grad_fn() # Exit early
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ReshapeGrad
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3903, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2', defined at:
File "main.py", line 99, in
tf.losses.add_loss(net.getLoss(boxes, classes))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/BoxNetwork.py", line 50, in getLoss
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in loss
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2018, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch
original_result = fn()
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 163, in calcLoss
positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 142, in calcAllLosses
classificationLoss = tf.nn.softmax_cross_entropy_with_logits(logits=scores, labels=refScores)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1960, in softmax_cross_entropy_with_logits
labels=labels, logits=logits, dim=dim, name=name)

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

@zhangxixi0904
Copy link

I got the same problem: Input to reshape is a tensor with 9489 values, but the requested shape has 12356

@balzac001
Copy link
Author

I think this error has some relationship with the version of CUDA and TF, I used another computer to try this program, which is install CUDA8 and TF1.2, this error doesn't exist, but my computer is installed CUDA9 and TF1.4, I don't know how to debug!!!

@zhangxixi0904
Copy link

Thank you so much for your advice!! I am wondering if the version incompatible too @balzac001

@timtian12
Copy link

Are you run in cuda9 finally?

@balzac001
Copy link
Author

No, I used Faster RCNN and YOLO, I can't solve this error!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants