-
Notifications
You must be signed in to change notification settings - Fork 609
Unit Tests Pass But tensorflow Produces Non-zero Exit Status #591
Comments
do you have cudnn installed as well? |
In the first output above I do, yes. I have cudnn-7 installed. The first example is the machine I use for all of my training runs. It is fully functional with tensorflow on GPU. To make sure, I In the second example where I have no GPU and no CUDA, cudnn is not installed. I am expecting all of my tests to run on CPU TF_EAGER on the GitHub Actions continuous integration machine. |
Ok, there are two different issues here: Exited with signal code 11This was actually a very opaque error occurring in a specific unit test because my test was calling The opaqueness of the error and lack of stack trace made this difficult to find. I fixed my test to call the model correctly and I no longer see the Exited with signal code 11. Exited with signal code 6V100 GPU machine, Ubuntu 18.04, CUDA 10.2, cuddn-7:
This is still happening on my GPU machine at the completion of all tests, but only when I run all my unit tests in parallel with |
I replaced all of the
It does not occur on my CPU-only GitHub Actions continuous integration machine. |
I'm definitely not the person to ask about this but IIRC |
Thanks @brettkoonce! Is this expected to be the case even when all Tensors and models are specified on |
I've been successfully running my unit tests without |
Swift for Tensorflow 0.12. On an Ubuntu 18.04 machine with CUDA 10.2 installed and a V100 GPU:
On a GitHub Action Ubuntu 18.04 machine with no GPU and no CUDA installed, I get this non-zero exit:
I see this on most runs of my 50 unit tests. This is a problem because my CI builds are being marked as failed when in fact all the tests are passing.
Has anyone encountered this on the continuous integration testing of Swift for Tensorflow projects?
I didn't encounter this on Swift for Tensorflow 0.11.
The text was updated successfully, but these errors were encountered: