Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ut train fails on GPU with JIT compilation failed #60

Open
Shubhamcl opened this issue Apr 21, 2023 · 0 comments
Open

ut train fails on GPU with JIT compilation failed #60

Shubhamcl opened this issue Apr 21, 2023 · 0 comments

Comments

@Shubhamcl
Copy link

Shubhamcl commented Apr 21, 2023

Using ubuntu 22, training works fine on CPU but when --num_gpus=1 I get this error stack.

This stack appears on following the instructions for the demo.

I first thought it is a tensorflow issue so I ran training on GPU using example from tensorflow tutorials, but that worked fine.

Detected at node 'SelectV2' defined at (most recent call last):
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/threading.py", line 908, in _bootstrap
self._bootstrap_inner()
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/threading.py", line 950, in _bootstrap_inner
self.run()
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
outputs = model.train_step(data)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 864, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 957, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 459, in update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/utime/evaluation/utils.py", line 22, in wrapper
mask = tf.where(tf.logical_and(
Node: 'SelectV2'
Detected at node 'SelectV2' defined at (most recent call last):
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/threading.py", line 908, in _bootstrap
self._bootstrap_inner()
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/threading.py", line 950, in _bootstrap_inner
self.run()
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
outputs = model.train_step(data)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 864, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/training.py", line 957, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 459, in update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
File "/home/shubham/anaconda3/envs/u-sleep/lib/python3.9/site-packages/utime/evaluation/utils.py", line 22, in wrapper
mask = tf.where(tf.logical_and(
Node: 'SelectV2'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node SelectV2}}]]
[[div_no_nan_1/ReadVariableOp/_12]]
(1) UNKNOWN: JIT compilation failed.
[[{{node SelectV2}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_13068]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant