Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running on GPU: device renaming issue? #79

Open
FranckLejzerowicz opened this issue Aug 28, 2019 · 2 comments
Open

Error running on GPU: device renaming issue? #79

FranckLejzerowicz opened this issue Aug 28, 2019 · 2 comments

Comments

@FranckLejzerowicz
Copy link

FranckLejzerowicz commented Aug 28, 2019

Hi,

So here's a command run on a gpu node in an interactiove slurm srun session:

$ rhapsody mmvec \
   --microbe-file A.biom \
   --metabolite-file B.biom  \
   --min-feature-count 5  \
   --epochs 20000 \
   --batch-size 1000  \
   --latent-dim 3  \
   --input-prior 1  \
   --learning-rate 1e-4  \
   --beta1 0.85 \
   --beta2 0.90  \
   --checkpoint-interval 60  \
   --summary-interval 60 \
   --arm-the-gpu  \
   --summary-dir gpu_1000_1e-4_20000  \
   --ranks-file gpu_1000_1e-4_20000/ranks.csv

The (long) error (sorry):


WARNING: Logging before flag parsing goes to stderr.
W0828 12:38:30.259999 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/bin/rhapsody:156: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0828 12:38:30.262325 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/bin/rhapsody:157: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-08-28 12:38:30.262596: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-08-28 12:38:30.273506: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-28 12:38:32.273961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560b1e030b60 executing computations on platform CUDA. Devices:
2019-08-28 12:38:32.274039: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-08-28 12:38:32.291287: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2019-08-28 12:38:32.294314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560b1d6caf10 executing computations on platform Host. Devices:
2019-08-28 12:38:32.294405: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-28 12:38:32.297357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:5e:00.0
2019-08-28 12:38:32.298520: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.299494: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.300329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.301209: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.302105: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.302962: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.304020: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.304122: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-08-28 12:38:32.304182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-28 12:38:32.304231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-28 12:38:32.304265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
W0828 12:38:32.641206 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:94: The name tf.log is deprecated. Please use tf.math.log instead.

W0828 12:38:32.643565 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:95: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0828 12:38:32.655179 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:106: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

W0828 12:38:32.694295 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:122: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.695811 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/ops/distributions/normal.py:160: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.724381 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:139: Multinomial.__init__ (from tensorflow.python.ops.distributions.multinomial) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.802299 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:187: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W0828 12:38:32.805364 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:189: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

W0828 12:38:32.810857 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:193: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

W0828 12:38:32.812450 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:195: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

W0828 12:38:32.851014 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0828 12:38:33.204426 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/ops/clip_ops.py:286: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0828 12:38:33.331943 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:210: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

Traceback (most recent call last):
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation random_normal/RandomStandardNormal: {{node random_normal/RandomStandardNormal}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[random_normal/RandomStandardNormal]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/flejzerowicz/rhapsody_ve_new/bin/rhapsody", line 221, in <module>
    rhapsody()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/flejzerowicz/rhapsody_ve_new/bin/rhapsody", line 168, in mmvec
    test_microbes_coo, test_metabolites_df.values)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py", line 210, in __call__
    tf.global_variables_initializer().run()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2679, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5614, in _run_using_default_session
    session.run(operation, feed_dict)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation random_normal/RandomStandardNormal: node random_normal/RandomStandardNormal (defined at /lib/python3.6/site-packages/rhapsody/multimodal.py:106) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[random_normal/RandomStandardNormal]]

Note the maybe relevant sinfo

$ sinfo -p gpu -N -o "%c %D %G %m %P"

CPUS NODES GRES MEMORY PARTITION
32 1 gpu:1 94208 gpu
32 1 gpu:1 94208 gpu

Any help greatly appreciated :)
Thanks!
Franck

@mortonjt
Copy link
Collaborator

Hi @FranckLejzerowicz , just to confirm, have you been able to run nvidia-smi? That can help to see if there are GPUs available. It looks like there is one GPU being recognized - not sure what that isn't being properly loaded.

@mortonjt
Copy link
Collaborator

mortonjt commented Sep 3, 2019

Hi @FranckLejzerowicz, there are two problems with the tensorflow-gpu setup

  1. Tensorflow-gpu must be installed independently of tensorflow.

To do this, you'd need the following

pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu --upgrade
  1. You need the right libraries linked to your GPU - so you'd need to module load cuda and cudaDNN on your cluster (I'm using cuda 10 and cudadnn v7.6.2)

Below are a couple of commands that I would print for debugging inside python

import tensorflow as tf
tf.test.gpu_device_name()

and

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Here is some of the output from my setup

>>> from tensorflow.python.client import device_lib
>>> tf.test.gpu_device_name()
2019-09-05 09:30:33.371294: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-05 09:30:33.399783: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1

2019-09-05 09:30:33.616269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x38b4a60 executing computations on platform CUDA. Devices:
2019-09-05 09:30:33.616301: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-09-05 09:30:33.618859: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-05 09:30:33.620264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3928a30 executing computations on platform Host. Devices:
2019-09-05 09:30:33.620282: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-05 09:30:33.621944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:30:33.622771: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.624800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:30:33.626704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:30:33.627375: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:30:33.629704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:30:33.631585: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:30:33.635094: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:30:33.638320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:30:33.638422: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.641755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:30:33.641837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:30:33.641895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:30:33.645848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
'/device:GPU:0'
>>>
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2019-09-05 09:31:25.908226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:31:25.908324: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:31:25.908358: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:31:25.908389: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:31:25.908419: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:31:25.908449: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:31:25.908479: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:31:25.908509: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:31:25.918502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:31:25.918541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:31:25.918551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:31:25.918559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:31:25.923547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15270850500731088615
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17617425747417705410
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 13070795884554441190
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 32039642727
locality {
  bus_id: 1
  links {
  }
}
incarnation: 6846769415979563337
physical_device_desc: "device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0"
]
>>> tf.test.gpu_device_name()
2019-09-05 09:30:33.371294: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-05 09:30:33.399783: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1

2019-09-05 09:30:33.616269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x38b4a60 executing computations on platform CUDA. Devices:
2019-09-05 09:30:33.616301: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-09-05 09:30:33.618859: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-05 09:30:33.620264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3928a30 executing computations on platform Host. Devices:
2019-09-05 09:30:33.620282: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-05 09:30:33.621944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:30:33.622771: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.624800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:30:33.626704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:30:33.627375: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:30:33.629704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:30:33.631585: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:30:33.635094: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:30:33.638320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:30:33.638422: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.641755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:30:33.641837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:30:33.641895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:30:33.645848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
'/device:GPU:0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants