Support Ubuntu 20.04 #512

garymm · 2020-08-07T03:16:13Z

Ubuntu 20.04 LTS was released on April 23, 2020. It would be nice to support this latest LTS version.

Here's what I've needed to do to get version 0.11 working on ubuntu 20.04:
sudo apt install libncurses5 libtinfo5

So maybe just adding that to the installation instructions for now would be a good start. Updating the code to support the newer libs would be another option.

The text was updated successfully, but these errors were encountered:

garymm · 2020-08-12T17:13:53Z

It seems the python support also doesn't work on 20.04 because it's looking for libpython3.6m.so.1.0. 20.04 comes with python3.8.2 and there's no easy way to get python 3.6.

marcrasi · 2020-08-12T17:25:51Z

It seems the python support also doesn't work on 20.04 because it's looking for libpython3.6m.so.1.0. 20.04 comes with python3.8.2 and there's no easy way to get python 3.6.

Can you tell me what specifically you did to encounter this problem, so that I can make sure that the ubuntu20.04 builds don't have this problem?

garymm · 2020-08-12T18:12:46Z

Tried running swift-jupyter as described here.

When starting the kernel, I saw errors like:

[I 09:42:54.199 NotebookApp] Kernel started: 1a8e1196-b812-4582-9bf8-e42fe72ef654, name: swift
         Traceback (most recent call last):
  File "/home/garymm/swift-tensorflow/usr/lib/python3/dist-packages/lldb/__init__.py", line 35, in <module>
import _lldb
ModuleNotFoundError: No module named '_lldb'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):           File "/home/garymm/src/swift-jupyter/swift_kernel.py", line 19, in <module>
    import lldb
           File "/home/garymm/swift-tensorflow/usr/lib/python3/dist-packages/lldb/__init__.py", line 38, in <module>
    from . import _lldb
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
[I 09:42:57.200 NotebookApp] KernelRestarter: restarting kernel (1/5), new random ports
                Traceback (most recent call last):
  File "/home/garymm/swift-tensorflow/usr/lib/python3/dist-packages/lldb/__init__.py", line 35, in <module>
import _lldb
        ModuleNotFoundError: No module named '_lldb'

garymm · 2020-08-22T23:28:11Z

I think the issue of python 3.6 vs 3.8 was a symptom of me trying to use a release that was built on Ubuntu 18.04 on 20.04.

I built the toolchain from source and got a build to succeed on 20.04 with CUDA 11.0 and CUDNN 8.0.2. The only real bug I had to fix is described here:
https://groups.google.com/a/tensorflow.org/g/swift/c/RUlBncvPRfE

marcrasi · 2020-09-15T21:21:18Z

I made some progress: #535

I'm still waiting on https://gitlab.com/nvidia/container-images/cuda/-/issues/83 before I can add cuda toolchains for ubuntu 20.04.

brettkoonce · 2020-12-17T21:11:01Z

@marcrasi toolchains have been updated!

marcrasi · 2020-12-21T20:19:00Z

I tried to make a CUDA build for ubuntu20.04, but there is still a small blocker: The version of TF that we use (2.3) supports CUDA 11.0 but not CUDA 11.1, and nvidia publishes docker images for ubuntu20.04 CUDA 11.1 but not CUDA 11.0.

I'm not sure if TF 2.4 supports CUDA 11.1, but I'll try again once we upgrade to TF 2.4 (which we're trying to do soon)

brettkoonce · 2020-12-21T22:05:45Z

@marcrasi it's my understanding that 2.4 is the first release that officially supports cuda 11.0 (https://github.com/tensorflow/tensorflow/releases/tag/v2.4.0), not sure how you got 11.0 working in the first place (a master pull?). Cuda 11.1 is the release that supports the new ampere consumer cards (11.0 is just for the a100 series), so it would be nice to have that in particular (tensorflow/tensorflow#44750). 11.2 is already out as well!

brettkoonce · 2020-12-21T22:09:35Z

also, @texasmichelle

you might run this and look at the logs being spit out:

export GPU_TYPE="a100"
export ZONE="us-central1-a"

gcloud compute instances create s4tf-ubuntu-${GPU_TYPE} \
  --zone=${ZONE} \
  --image-project=deeplearning-platform-release \
  --image-family=swift-latest-gpu-ubuntu-1804 \
  --maintenance-policy=TERMINATE \
  --accelerator="type=nvidia-tesla-${GPU_TYPE},count=1" \
  --metadata="install-nvidia-driver=True" \
  --machine-type=a2-highgpu-1g \
  --boot-disk-size=256GB

texasmichelle · 2020-12-21T23:09:13Z

@brettkoonce Can you share what you're seeing? I'm getting a warning about disk size, but otherwise that command seems to be working. Are you running in a project that has quota?

texasmichelle · 2020-12-21T23:12:22Z

Or are you pointing this out as an example of a toolchain running with cuda 11 support?

brettkoonce · 2020-12-22T02:47:02Z

@texasmichelle I was seeing some weird errors when running swift-models (eg lenet-mnist), but in retrospect what's going on is that I think you packaged the 10.2 cuda version with your deep learning build. After pulling the cuda 11 build (eg swift-tensorflow-RELEASE-0.12-cuda11.0-cudnn8-ubuntu18.04.tar.gz) everything works fine. It might be worth considering moving to 11.0 going forward. Still seeing tensorflow/swift-models#704 fwiw.

texasmichelle · 2020-12-22T17:36:35Z

ah, I see what you mean. I also tried using --image-family=swift-latest-cu110-ubuntu-1804, which seems fine on the tensorflow-0.12 branch of swift-models. However, I can see that the 0.12 release hasn't made it into the images yet. There's currently a code freeze for the holidays, but I'll see if I can get a more precise date on the next release. I submitted the change a few weeks ago, so I believe the code is ready otherwise.

texasmichelle · 2020-12-23T19:48:10Z

@brettkoonce You can expect to see DLVMs with v0.12 right after the freeze, e.g. by Jan. 8.

I also verified that cuda 11.0 is included in the existing toolchain and will remain going forward.

machineko · 2020-12-29T23:41:49Z

1 week ago =>

Ubuntu20.04 x86_64 cudnn images have been pushed! Having an issue with arm64 and ppc64le builds though. Will close this once those are released.

So could we got ubuntu precompiled with cuda (preferably 11.1 version for amper support :D [
nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04] ), or we still need to wait for 11.1 version in the master Tensorflow repo?

texasmichelle assigned marcrasi Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Ubuntu 20.04 #512

Support Ubuntu 20.04 #512

garymm commented Aug 7, 2020

garymm commented Aug 12, 2020

marcrasi commented Aug 12, 2020

garymm commented Aug 12, 2020 •

edited

Loading

garymm commented Aug 22, 2020

marcrasi commented Sep 15, 2020

brettkoonce commented Dec 17, 2020

marcrasi commented Dec 21, 2020

brettkoonce commented Dec 21, 2020

brettkoonce commented Dec 21, 2020

texasmichelle commented Dec 21, 2020

texasmichelle commented Dec 21, 2020

brettkoonce commented Dec 22, 2020

texasmichelle commented Dec 22, 2020

texasmichelle commented Dec 23, 2020

machineko commented Dec 29, 2020 •

edited

Loading

Support Ubuntu 20.04 #512

Support Ubuntu 20.04 #512

Comments

garymm commented Aug 7, 2020

garymm commented Aug 12, 2020

marcrasi commented Aug 12, 2020

garymm commented Aug 12, 2020 • edited Loading

garymm commented Aug 22, 2020

marcrasi commented Sep 15, 2020

brettkoonce commented Dec 17, 2020

marcrasi commented Dec 21, 2020

brettkoonce commented Dec 21, 2020

brettkoonce commented Dec 21, 2020

texasmichelle commented Dec 21, 2020

texasmichelle commented Dec 21, 2020

brettkoonce commented Dec 22, 2020

texasmichelle commented Dec 22, 2020

texasmichelle commented Dec 23, 2020

machineko commented Dec 29, 2020 • edited Loading

garymm commented Aug 12, 2020 •

edited

Loading

machineko commented Dec 29, 2020 •

edited

Loading